To run a scoring model
in Hadoop, follow these steps:
-
Create a traditional
scoring model by using SAS Enterprise Miner or an analytic store scoring
model using the SAS Factory Miner HPFOREST or HPSVM component.
-
-
[Optional] Create a
metadata file for the input data file.
The metadata file has
the extension .sashdmd and must be stored in the HDFS. Use PROC HDMD
to generate the metadata file.
Note: You do not have to create
a metadata file for the input data file if the data file is created
with a Hadoop LIBNAME statement that contains the HDFS_DATADIR= and
HDFS_METADIR options. In this instance, metadata files are automatically
generated.
Note: SAS/ACCESS requires Hadoop
data to be in Hadoop standard UTF-8 format. If you are using DBCS
encoding, you must extract the value of the character length in the
engine-generated SASHDMD metadata file and multiply it by the number
of bytes of a single character in order to create the correct byte
length for the record.
-
Specify the Hadoop connection
attributes.
%let indconn= user=myuserid;
-
Run the %INDHD_PUBLISH_MODEL macro.
With traditional model
scoring, the
%INDHD_PUBLISH_MODEL
performs the following tasks using some of the files that are created
by the SAS Enterprise Miner Score Code Export node: the scoring model
program (score.sas file), the properties file (score.xml file), and
(if the training data includes SAS user-defined formats) a format
catalog:
-
translates the scoring model into
the sasscore_modelname.ds2
file that is used to run scoring inside the SAS Embedded Process.
-
takes the format catalog, if available,
and produces the sasscore_modelname_ufmt.xml
file. This file contains user-defined formats for the scoring model
that is being published.
-
uses SAS/ACCESS Interface to Hadoop
to copy the sasscore_modelname.ds2
and sasscore_modelname_ufmt.xml
scoring files to the HDFS.
With analytic store
scoring, the %INDHD_PUBLISH_MODEL
macro takes the files that are created by the SAS Factory Miner HPFOREST
or HPSVM components: the DS2 scoring model program (score.sas file),
the analytic store file (score.sasast file), and (if the training
data includes SAS user-defined formats) a format catalog, and copies
them to the HDFS.
-
Run the %INDHD_RUN_MODEL macro.
The %INDHD_PUBLISH_MODEL macro publishes the model to Hadoop, making the model
available to run against data that is stored in the HDFS.
The %INDHD_RUN_MODEL macro starts a MapReduce job that uses the files generated
by the %INDHD_PUBLISH_MODEL
to execute the DS2 program. The MapReduce job stores the DS2 program
output in the HDFS location that is specified by either the OUTPUTDATADIR=
argument or by the <outputDir> element in the HDMD file.
-
Submit an SQL query
against the output file.