To run a scoring model
in Hadoop, follow these steps:
Create a scoring model
using SAS Enterprise Miner.
[Optional] Create a
metadata file for the input data file.
The metadata file has
the extension .sashdmd and must be stored in the HDFS. Use PROC HDMD
to generate the metadata file.
Note: You do not have to create
a metadata file for the input data file if the data file is created
with a Hadoop LIBNAME statement that contains the HDFS_DATADIR= and
HDFS_METADIR options. In this instance, metadata files are automatically
Note: SAS/ACCESS requires Hadoop
data to be in Hadoop standard UTF-8 format. If you are using DBCS
encoding, you must extract the value of the character length in the
engine-generated SASHDMD metadata file and multiply it by the number
of bytes of a single character in order to create the correct byte
length for the record.
Connect to the HDFS
using this command.
%let indconn=hdfs_server=myhdfsserver hdfs_port=8020 user=myuserid;
Run the
%INDHD_PUBLISH_MODEL macro uses some of the files that the SAS Enterprise
Miner Score Code Export node creates:
the scoring model program (
the properties file (score.xml
a format catalog (if the training
data includes SAS user-defined formats)
The %INDHD_PUBLISH_MODEL macro translates the file into a DS2 program
and, if needed, generates an XML file for the user-defined formats.
Then all model files (the SAS program, the DS2 program, the score.xml
file, and the XML file for user-defined formats) are copied to the
Connect to the MapReduce
JobTracker using this command.
%let indconn=hdfs_server=myhdfsserver hdfs_port=hdfsport
mapred_server=mapred-server-name mapred_port=mapred-port-number;
Run the
The %INDHD_PUBLISH_MODEL macro publishes the model to Hadoop, making the model
available to run against data that is stored in the HDFS.
The %INDHD_RUN_MODEL macro starts a MapReduce job that uses the files generated
to execute the DS2 program. The MapReduce job stores the DS2 program
output in the HDFS location that is specified by either the OUTPUTDATADIR=
argument or by the <outputDir> element in the HDMD file.
Submit an SQL query
against the output file.