Using the SAS Embedded Process to Run Scoring Models

How to Run a Scoring Model with the SAS Embedded Process

The integration of the SAS Embedded Process and Netezza allows scoring code to run directly using the SAS Embedded Process on Netezza.
Note: The SAS Embedded Process might require a later release of Netezza than function-based scoring. For more information, see the SAS Foundation system requirements documentation for your operating environment.
To run the scoring model using the SAS Embedded Process, follow these steps.
  1. Create a table to hold the scoring files.
    The %INDNZ_CREATE_MODELTABLE macro creates a table that holds the scoring files for the model that is being published.
    For more information, see Creating a Model Table.
  2. Run the %INDNZ_PUBLISH_MODEL to create the scoring files.
    The %INDNZ_PUBLISH_MODEL macro uses some of the files that are created by the SAS Enterprise Miner Score Code Export node: the scoring model program (score.sas file), the properties file (score.xml file), and (if the training data includes SAS user-defined formats) a format catalog.
    The %INDNZ_PUBLISH_MODEL macro performs the following tasks:
    • translates the scoring model into the sasscore_modelname.ds2 file that is used to run scoring inside the SAS Embedded Process
    • takes the format catalog, if available, and produces the sasscore_modelname_ufmt.xml file. This file contains user-defined formats for the scoring model that is being published.
    • uses the SAS/ACCESS Interface to Netezza to insert the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files into the model table that was created using the %INDNZ_CREATE_MODELTABLE macro.
  3. Execute the SAS_EP stored procedure to run the scoring model.
    For more information, see SAS_EP Stored Procedure.

Creating a Model Table

Overview

When using the SAS Embedded Process to publish a scoring model in Netezza, you must create a table to hold the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files. You must run the %INDNZ_CREATE_MODELTABLE macro to create the table before you run the %INDNZ_PUBLISH_MODEL macro.
You have to create the table only one time to hold a model’s scoring files.
The model table contains the following columns. The ModelName column is the table key. The table is referenced by the two-level name model-name.model-table-name.
Column Name
Description
Specification
ModelName
contains the name of the model.
NVARCHAR(128) NOT NULL
ModelSequence
contains the sequence number for the block of ModelDS2 data.
The ModelDS2 files is divided into blocks with each block in one row for one model.
INTEGER NOT NULL
ModelDS2
contains the sasscore_modelname.ds2 file.
NVARCHAR (8000)
ModelFormats
contains the sasscore_modelname_ufmt.xml file.
NVARCHAR (8000)
ModelUUID1
contains the UUID of the source model.
VARCHAR (36)
Notes1
contains additional information that describes the source model.
VARCHAR (512)
1This column is for use by SAS Model Manager. If you have a model table that was created prior to SAS 9.4 and you want this column in your model table, you must run the %INDNZ_CREATE_MODELTABLEmacro to re-create your model table.

%INDNZ_CREATE_MODELTABLE Run Process

To run the %INDNZ_CREATE_MODELTABLE macro, complete the following steps:
  1. Start SAS and submit the following command in the Program Editor or Enhanced Editor:
    %let indconn = server=myserver user=myuserid password=xxxx database=mydb
    schema=myschema;
    For more information, see the INDCONN Macro Variable.
  2. Run the %INDNZ_CREATE_MODELTABLE macro.
    For more information, see %INDNZ_CREATE_MODELTABLE Macro Syntax.

%INDNZ_CREATE_MODELTABLE Macro Syntax

Arguments

DATABASE=database-name

specifies the name of a Netezza database where the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files are held.

Default The database specified in the INDCONN macro variable or your current database.

MODELTABLE=model-table-name

specifies the name of the table that holds the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files.

Default sas_model_table
Requirements The model table name cannot contain a period (.). Otherwise, an error occurs.
The maximum table name length is 128 characters, and it must be a valid Netezza table name.
Interaction The table name that you specify for this macro must be the same table name that is used in the %INDNZ_PUBLISH_MODEL macro.
See %INDNZ_PUBLISH_MODEL Macro Syntax

DBSCHEMA=schema-name

specifies the name of a Netezza schema to which the model table is published.

Restriction This argument is supported only on Netezza v7.0.3 or later.
Interaction The schema that is specified by the DBSCHEMA= argument takes precedence over the schema that you specify in the INDCONN macro variable. If you do not specify a schema in the DBSCHEMA= argument or the INDCONN macro variable, the default schema for the target database is used.

ACTION = CREATE | REPLACE | DROP

specifies one of the following actions that the macro performs:

CREATE

creates a new table.

Tip If the table has been previously defined and you specify ACTION=CREATE, an error is issued.

REPLACE

overwrites the current table, if a table with the same name is already registered.

Tip If you specify ACTION = REPLACE, and the current table contains sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml files, the files are deleted and an empty table is re-created.

DROP

causes all models in this table to be dropped.

Default CREATE

SAS_EP Stored Procedure

Overview of the SAS_EP Stored Procedure

The SAS_EP stored procedure is the interface for running the scoring model inside Netezza with the SAS Embedded Process. The SAS_EP stored procedure uses the files that are stored in the model table. The stored procedure parameters enable you to control the name and location of the output table, how much data is returned, and how it is returned.
The SAS_EP stored procedure is installed in the NZRC directory. To run the stored procedure, you must have the following permissions:
  • User name must be created with the IBM Netezza Analytics utility
  • Database must be created with the IBM Netezza Analytics utility
For more information, see Netezza Permissions.

Running the SAS_EP Stored Procedure

You can run the SAS_EP stored procedure using explicit pass-through and PROC SQL or you can use NZSQL.
Note: Before running the SAS_EP stored procedure, you must create the model table with the %INDNZ_CREATE_MODELTABLE macro and then you must publish the files to the model table with the %INDNZ_PUBLISH_MODEL macro.
Here is an example using PROC SQL.
proc sql;
   connect to netezza (user=userid password=xxxx server=myserver);
   execute 
      (CALL NZRC..SAS_EP
         ('
           model_name=intr_reg,
           input_table=intr_reg_20m,
           input_columns= id count dif_srvr flag hot sam_srat service srv_cnt,
           output_table=intr_reg_out,
           journal_table=intr_reg_out_jnl,
           ds2_keep=id EM_CLASSIFICATION EM_EVENTPROBABILITY EM_PROBABILITY
          ');
      ) by netezza;
   disconnect from netezza;
quit;
For more information about the stored procedure parameters, see SAS_EP Stored Procedure Syntax.

SAS_EP Stored Procedure Syntax

Parameters

MODEL_NAME=model-name

specifies the name of the model.

Requirement There must be at least one row in the model table with this key in the ModelName column and with a non-null value in the ModelDS2 column.
Tip The model name is case sensitive.

INPUT_TABLE=<<"><input-database<">..>><">input-table<">

specifies the name of the scoring model input table.

Requirement If the input table name is case sensitive, you must use double quotation marks (for example, "myinDB..myinTABLE").

OUTPUT_TABLE=<<">output-database<">.. ><">output-table<">

specifies the name of the scoring model output table.

Requirement If the output table name is case sensitive, you must use double quotation marks (for example, "myoutDB..myoutTABLE").

MODEL_TABLE=<<">model-table-database<">.. ><">model-table<">

specifies the name of the model table where the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files were published with the %INDTD_CREATE_MODELTABLE macro.

Default sas_model_table
Restriction This argument is available only when using the SAS Embedded Process.
Requirements If the model table name is case sensitive, you must use double quotation marks(for example, "mymtDB..mymtTABLE").
The name of the model table must be the same as the name specified in the %INDNZ_CREATE_MODELTABLE macro. For more information, see the MODELTABLE argument in %INDNZ_CREATE_MODELTABLE Macro Syntax.

INPUT_COLUMNS=<">column<"><...<">column<">>

specifies one or more columns from the input table that are passed to the SAS Embedded Process.

Default All columns
Requirement If the column name is case sensitive, you must use double quotation marks (for example, "Profit" "Sales" "Margin").

INPUT_WHERE=where-clause

specifies a valid WHERE clause that selects the rows from the input table.

OUTPUT_DISTRIBUTION=<">column<"><, ...<">column<">>

specifies one or more columns that are the distribution key for the output table.

Default Current database distribution
Requirement If the column name is case sensitive, you must use double quotation marks (for example, "Profit", "Sales", "Margin").

OUTPUT_TEMPORARY= YES | NO

specifies whether the output table is temporary.

Default NO

JOURNAL_TABLE=<<">journal-database<">..><">journal-table<">

specifies the name of a table that the SAS_EP stored procedure creates. This table holds any journal messages and notes from the SAS journal facility that are produced when executing the store procedure.

Requirement If the journal table name is case sensitive, you must use double quotation marks (for example, "myjnlDB..myjnlTABLE").
Tip If JOURNAL_TABLE is not specified, a journal is not created. If JOURNAL_TABLE is specified and a journal table exists, the journal is appended.

PARTITION_BY=<">column<"><, ...<">column<">>

specifies one or more columns by which to partition the input.

Default No partitioning occurs
Requirement If the column name is case sensitive, you must use double quotation marks(for example, "Profit", "Sales", "Margin").

ORDER_BY=<">column<"><, ...<">column<">>

specifies one or more columns by which to order the input.

Default No reordering occurs
Requirement If the column name is case sensitive, you must use double quotation marks (for example, "Profit", "Sales", "Margin").

Tips for Using the SAS_EP Stored Procedure

  • No specific parameter order is required.
  • Table names must be enclosed in double quotation marks if they are case sensitive. Otherwise, double quotation marks are optional.
  • Tables can be qualified with a database name. If a table name is not qualified with a database name, the table name is resolved based on the default database for your session.
  • All parameters are passed as strings to the SAS_EP stored procedure, so they must be enclosed in single quotation marks. To pass a single quotation mark as part of the SQL within a parameter, use two adjacent single quotation marks as shown in the following example:
    'INPUT_WHERE=where name like ''%Jones%'''

Netezza Scoring Files

When using the SAS Embedded Process, the %INDNZ_PUBLISH_MODEL macro produces two scoring files for each model:
  • sasscore_modelname.ds2. This file contains code that is executed by the SAS_EP stored procedure.
  • sasscore_modelname_ufmt.xml. This file contains user-defined formats for the scoring model that is being published. This file is used by the SAS_EP stored procedure.
These files are published to the model table that you specify in the %INDNZ_PUBLISH_MODEL macro. See Scoring File Examples for an example of each of these files.
A message that indicates whether the scoring files are successfully or not successfully created is printed to the SAS log.
Although you cannot view the scoring files directly, there are two ways to see the models whose files are created:
  • From Netezza, log on to the database using a client tool such as NZSQL and submit an SQL statement. The following example assumes that the model table where the scoring files were published is modtable and the model name is super.
    nzsql database username password
      
     select modelname, modelsequence from modtable where 
       modelname like '%super%'
  • From SAS, use SQL procedure code that produces output in the LST file. The following example assumes that the model name that you used to create the scoring files is super.
    proc sql noerrorstop;
      connect to netezza (user=username password=xxxx server=myserver);
      
    select * from connection to netezza
          (select modelname, modelsequence
             from sasmodeltablename
                where modelname like '%super%');
    disconnect netezza;
    quit;
    
    You can also use the SASTRACE and SASTRACELOC system options to generate tracing information. For more information about these system options, see the SAS System Options: Reference.