Running Scoring Models in Hadoop

To run a scoring model in Hadoop, follow these steps:

Create a traditional scoring model by using SAS Enterprise Miner or an analytic store scoring model using the SAS Factory Miner HPFOREST or HPSVM component.
Start SAS.
[Optional] Create a metadata file for the input data file.

The metadata file has the extension .sashdmd and must be stored in the HDFS. Use PROC HDMD to generate the metadata file.

Note: You do not have to create a metadata file for the input data file if the data file is created with a Hadoop LIBNAME statement that contains the HDFS_DATADIR= and HDFS_METADIR options. In this instance, metadata files are automatically generated.

Note: SAS/ACCESS requires Hadoop data to be in Hadoop standard UTF-8 format. If you are using DBCS encoding, you must extract the value of the character length in the engine-generated SASHDMD metadata file and multiply it by the number of bytes of a single character in order to create the correct byte length for the record.

For more information, see Creating a Metadata File for the Input Data File and PROC HDMD in SAS/ACCESS for Relational Databases: Reference.
Specify the Hadoop connection attributes.
```
%let indconn= user=myuserid;
```
For more information, see INDCONN Macro Variable.
Run the %INDHD_PUBLISH_MODEL macro.
With traditional model scoring, the %INDHD_PUBLISH_MODEL performs the following tasks using some of the files that are created by the SAS Enterprise Miner Score Code Export node: the scoring model program (score.sas file), the properties file (score.xml file), and (if the training data includes SAS user-defined formats) a format catalog:
- translates the scoring model into the sasscore_modelname.ds2 file that is used to run scoring inside the SAS Embedded Process.
- takes the format catalog, if available, and produces the sasscore_modelname_ufmt.xml file. This file contains user-defined formats for the scoring model that is being published.
- uses SAS/ACCESS Interface to Hadoop to copy the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files to the HDFS.
With analytic store scoring, the %INDHD_PUBLISH_MODEL macro takes the files that are created by the SAS Factory Miner HPFOREST or HPSVM components: the DS2 scoring model program (score.sas file), the analytic store file (score.sasast file), and (if the training data includes SAS user-defined formats) a format catalog, and copies them to the HDFS.

For more information, see %INDHD_PUBLISH_MODEL Macro Syntax.
Run the %INDHD_RUN_MODEL macro.

The %INDHD_PUBLISH_MODEL macro publishes the model to Hadoop, making the model available to run against data that is stored in the HDFS.

The %INDHD_RUN_MODEL macro starts a MapReduce job that uses the files generated by the %INDHD_PUBLISH_MODEL to execute the DS2 program. The MapReduce job stores the DS2 program output in the HDFS location that is specified by either the OUTPUTDATADIR= argument or by the <outputDir> element in the HDMD file.

For more information, see %INDHD_RUN_MODEL Syntax.
Submit an SQL query against the output file.

For more information, see Scoring Output.