Using the SAS Embedded Process to Run Scoring Models

How to Run a Scoring Model with the SAS Embedded Process

The integration of the SAS Embedded Process and Teradata allows scoring code to run directly using the SAS Embedded Process on Teradata.
Note: The SAS Embedded Process might require a later release of Teradata than function-based scoring. For more information, see http://www.sas.com/technologies/analytics/datamining/scoring_acceleration/#section=5 and SAS In-Database Products: Administrator's Guide.
To run the scoring model using the SAS Embedded Process, follow these steps.
  1. Create a table to hold the scoring files.
    The %INDTD_CREATE_MODELTABLE macro creates a table that holds the scoring files for the model that is being published.
    For more information, see Creating a Model Table.
  2. Run the %INDTD_PUBLISH_MODEL to create the scoring files.
    The %INDTD_PUBLISH_MODEL macro uses some of the files that are created by the SAS Enterprise Miner Score Code Export node: the scoring model program (score.sas file), the properties file (score.xml file), and (if the training data includes SAS user-defined formats) a format catalog.
    The %INDTD_PUBLISH_MODEL macro performs the following tasks:
    • translates the scoring model into the sasscore_modelname.ds2 file that is used to run scoring inside the SAS Embedded Process.
    • takes the format catalog, if available, and produces the sasscore_modelname_ufmt.xml file. This file contains user-defined formats for the scoring model that is being published.
    • uses the SAS/ACCESS Interface to Teradata to insert the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files into the model table that was created using the %INDTD_CREATE_MODELTABLE macro.
  3. Execute the SAS_SCORE_EP stored procedure to run the scoring model.
    For more information, see SAS_SCORE_EP Stored Procedure.

Creating a Model Table

Overview

When using the SAS Embedded Process to publish a scoring model in Teradata, you must create a table to hold the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files. You must run the %INDTD_CREATE_MODELTABLE macro to create the table before you run the %INDTD_PUBLISH_MODEL macro.
You have to create the table only one time to hold a model’s scoring files.
The model table contains the following columns. The ModelName column is the table key. The table is referenced by the two-level name model-name.model-table-name.
Column Name
Description
Specification
ModelName
contains the name of the model
VARCHAR(128) CHARACTER SET UNICODE CASESPECIFIC
ModelDS2
contains the sasscore_modelname.ds2 file
BLOB(209708800)
ModelFormats
contains the sasscore_modelname_ufmt.xml file
BLOB(209708800)
ModelMetadata
reserved by SAS for future use
BLOB(4M)
ModelOwner
contains the name of the user who published the model
VARCHAR(128) CHARACTER SET UNICODE CASESPECIFIC
ModelUpdated
contains the date and time that the model was published
TIMESTAMP(6)

%INDTD_CREATE_MODELTABLE Run Process

To run the %INDTD_CREATE_MODELTABLE macro, complete the following steps:
  1. Start SAS 9.3 and submit the following commands in the Program Editor or Enhanced Editor:
    %indtdpm;
    %let indconn = server=myserver user=myuserid password=xxxx database=mydb;
    For more information, see %INDTDPM Macro and the INDCONN Macro Variable.
  2. Run the %INDTD_CREATE_MODELTABLE macro.
    For more information, see %INDTD_CREATE_MODELTABLE Macro Syntax.

%INDTD_CREATE_MODELTABLE Macro Syntax

%INDTD_CREATE_MODELTABLE
(<DATABASE=database-name>
<, MODELTABLE=model-table-name>
<, ACTION=CREATE | REPLACE | DROP>
);
Arguments

DATABASE=database-name

specifies the name of a Teradata database where the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files are held.

Default The database specified in the INDCONN macro variable or your current database

MODELTABLE=model-table-name

specifies the name of the table that holds the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files.

Default sas_model_table
Requirement The maximum table name length is 30 characters, and it must be a valid Teradata table name.
Interaction The table name that you specify for this macro must be the same table name that is used in the %INDTD_PUBLISH_MODEL macro.
See %INDTD_PUBLISH_MODEL Macro Syntax

ACTION = CREATE | REPLACE | DROP

specifies one of the following actions that the macro performs:

CREATE

creates a new table.

Tip If the table has been previously defined and you specify ACTION=CREATE, an error is issued.

REPLACE

overwrites the current table, if a table with the same name is already registered.

Tip If you specify ACTION = REPLACE, and the current table contains sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml files, the files are deleted and an empty table is re-created.

DROP

causes all models in this table to be dropped.

Default CREATE

SAS_SCORE_EP Stored Procedure

Overview of the SAS_SCORE_EP Stored Procedure

The SAS_SCORE_EP stored procedure is the interface for running the scoring model inside Teradata with the SAS Embedded Process. The SAS_SCORE_EP stored procedure uses the files that are stored in the model table. The stored procedure parameters enable you to control the name and location of the output table, how much data is returned, and how it is returned.
The SAS_SCORE_EP stored procedure is installed in the SAS_SYSFNLIB database. To run the stored procedure, you must have the following permissions:
  • EXECUTE PROCEDURE permission on the SAS_SYSFNLIB database
  • EXECUTE FUNCTION permission on the SAS_SYSFNLIB database
  • EXECUTE FUNCTION ON SYSLIB.MonitorVirtualConfig permission on the SYSLIB.MonitorVirtualConfig function
For more information, see Teradata Permissions.

Running the SAS_SCORE_EP Stored Procedure

You can run the SAS_SCORE_EP stored procedure using explicit pass-through and PROC SQL or you can use other Teradata query tools.
Tip
Look at the SampleSQL.txt file that is produced when the %INDTD_PUBLISH_MODEL macro is successfully run. This file can be found in the output directory (OUTDIR argument) that you specify in the %INDTD_PUBLISH_MODEL macro. The SampleSQL.txt file contains basic SQL code that, with modifications, can be used to run your score code inside Teradata.
Note: Before running the SAS_SCORE_EP stored procedure, you must create the model table with the %INDTD_CREATE_MODELTABLE macro and then you must publish the files to the model table with the %INDTD_PUBLISH_MODEL macro.
Here is an example using PROC SQL.
proc sql;
   connect to teradata (user=userid password=xxxx server=myserver mode=Teradata);
   execute 
      (CALL SAS_SYSFNLIB.SAS_SCORE_EP
         (
            'MODELTABLE="grotto”.”sas_publish_model”',
            'MODELNAME=Intr_Tree',
            'INQUERY=SELECT * from ”grotto”.”score_input_table” WHERE x1 < 1.0',
            'OUTTABLE=”grotto”.”sas_score_out1”',
            'OUTKEY=id',
            'OPTIONS=’  /* can be blank or NULL if no options are needed */
         )
      ) by teradata;
   disconnect from teradata;
quit;
Note: You must specify MODE=TERADATA in your connection string.
For more information about the stored procedure parameters, see SAS_SCORE_EP Stored Procedure Syntax.

SAS_SCORE_EP Stored Procedure Syntax

SAS_SYSFNLIB.SAS_SCORE_EP
('MODELTABLE="database"."model-table-name" ',
'MODELNAME=model-name',
'INQUERY=SELECT ...',
'OUTTABLE=< "output-database-name". >"output-table-name" ',
'OUTKEY=column<..., column,> | NO PRIMARY INDEX,
NULL | 'OPTIONS=' | 'OPTIONS=option; <...; option;>'
);
Parameters

database

specifies the name of the database that contains the model table.

Default Database in the current session
Requirement The database must be the same as the one specified in the %INDTD_PUBLISH_MODEL macro’s DATABASE argument.

model-table-name

specifies the name of the model table where the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files were published with the %INDTD_CREATE_MODELTABLE macro.

model-name

specifies the name of the model.

SELECT ...

specifies a SELECT statement that defines the inputs to the SAS Embedded Process.

Range The INQUERY= parameter string can be up to 30,000 characters long.
Restrictions The maximum number of characters in the query is 30,000.
The maximum number of input and output colums is 1024.
Requirements If the query is greater than 1,000 characters, the INQUERY= parameter must be the first parameter listed in the stored procedure.
The SELECT statement must be syntactically correct SQL.
Interaction A query can reference tables or views, except if you specify the DIRECT option. If you specify the DIRECT option, the query can reference only tables.
Tip To pass single quotation marks around character literals, use two adjacent single quotation marks. This is an example.
'INQUERY=select * from my_input_tbl where name like ''%Jones%'''

< "output-database-name". >"output-table-name"

specifies the location of the scoring model output table.

output-database-name

specifies the database where the table is created or currently exists.

Default Current database

output-table-name

specifies the name of the table to be created or the table that currently exists.

Requirement The output table can already exist. If the output table already exists, scored rows are inserted into the table along with any existing rows. If the output table already exists, the output columns must match the existing table’s columns for the insert to succeed.
Interaction The output table can be a temporary table by adding VOLATILE=YES in the OPTIONS parameter. The temporary table can be used only for the duration of the SQL session where it is created.

column

specifies the column or columns that are used as the primary index for the output table.

Requirements The column must exist in the output table.
If there are multiple primary index columns, the column names must be separated by commas.

NO PRIMARY INDEX

specifies that there is no primary index for the output table and that output rows are placed on the same Teradata Access Module Processor (AMP) that ran the scoring code for the corresponding input row.

NULL | 'OPTIONS='

specifies that no options are used.

Tip You can use either 'OPTIONS=' or NULL to indicate that no options are used.
Example These two code lines are identical.
call sas_sysfnlib.sas_score_ep ('modeltable=...', modelname=...', 
'inquery=...', 'outtable=scored_output1', 'outkey=no primary index', 
null);
call sas_sysfnlib.sas_score_ep ('modeltable=...', modelname=...', 
'inquery=...', 'outtable=scored_output1', 'outkey=no primary index',
'options=');

option

specifies additional options for the stored procedure. option can be any of the following values:

CONTRACT=YES | NO

specifies whether to write the output metadata to the table specified in the OUTTABLE= parameter.

Default NO
Interaction If you specify CONTRACT=YES, the OUTKEY= parameter is ignored.
Tip The output is written to the table in the form of one row per output value with the sequence, name, data type, and length for each output.

DIRECT=YES | NO

specifies whether direct retrieve optimization is enabled.

Default NO
Interaction This option affects the stored procedure SQL generation.
Tip The direct retrieve optimization improves performance in the case where the input to the SAS Embedded Process is a table and the input query is SELECT * FROM table. When DIRECT=YES, the INQUERY= parameter is only the table name. No SELECT statement is needed.

DS2_KEEP=column-name<...column-name>

specifies the column or columns that are passed to the SAS_SCORE_EP procedure and are applied as a dynamic KEEP= statement in the sasscore_modelname.ds2 file.

Requirement If more than one column is specified, column names must be separated with spaces.
Interaction Specify CONTRACT=YES to preview the available output columns without executing the model.

ENCODING= LATIN | UNICODE

specifies the character data encoding for internationalization.

Default LATIN
See SAS National Language Support (NLS): Reference Guide

EPTRACE=YES

specifies that journal messages are written to the journal table.

HASHBY=column-name<..., column-name>

specifies one or more columns to use for the HASH BY clause.

Requirement If more than one column is specified, column names must be separated with commas.
Interaction This option affects the stored procedure SQL generation.
Note Data is redistributed by hash code to the TERADATA AMPs based on this column or columns although there is no implied ordering to the groups.

JOURNALTABLE=journal-table-name

specifies the name of a table that the stored procedure creates. This table holds any journal messages and notes from the SAS journal facility that are produced when executing the store procedure.

Requirement The name must follow Teradata naming conventions for table names.
Note Use a SELECT statement to retrieve the journal messages from the table after the stored procedure call is complete.

LOCALE=sas-locale

specifies set of attributes in the SAS session that reflect the language, local conventions, and culture for a geographical region.

Requirement sas-locale must be one of the five-character POSIX values, for example fr_FR.
See SAS National Language Support (NLS): Reference Guide

ORDERBY=column-name<..., column-name>

specifies one or more columns to use for the LOCAL ORDER BY (BY groups) clause.

Requirement If more than one column is specified, column names must be separated with commas.
Interaction This option affects the stored procedure SQL generation.

SELECT_LIST=column-name<..., column-name>

specifies the column or columns that are used in the SQL that is generated by the SAS_SCORE_EP stored procedure.

Default * (asterisk) which indicates all columns
Requirement If more than one column is specified, column names must be separated with commas.

SQLTRACE=table-name

specifies the name of a table to hold the generated SQL code.

Tip This table is useful for stored procedure debugging or to reference later if you want to customize the SQL code that is used to call the SAS Embedded Process.

UNIQUE=YES | NO

specifies whether the primary index of the output table is unique.

Default NO

VOLATILE=YES | NO

specifies whether the output table is created as a temporary table.

Default NO
Interaction This option affects the stored procedure SQL generation.

Range The OPTIONS= parameter string can be from 0–20,000 characters long.
Requirements Each option must end with a semicolon, including the last option in the list.
If the OPTIONS= parameter string is greater than 1,000 characters, it must be the last parameter.
Note option can be blank or NULL if no options are needed.
Tip Options that are not recognized as directives to the stored procedure are passed to the SAS Embedded Process as Query Band name-value pairs. If the SAS Embedded Process does not recognize them, they are ignored. Up to ten user-defined Query Band name-value pairs can be specified in addition to the options listed here that are Query Band name-value pairs. The maximum length of the query band is 2048 characters. User-defined Query Band information is logged in Teradata Database Query Log (DBQL) that makes it useful for workload analysis and reporting.

Tips for Using the SAS_SCORE_EP Stored Procedure

  • No specific parameter order is required, but if the INQUERY parameter string is greater than 1,000 characters, it must be the first parameter. Similarly, if the OPTIONS parameter string is greater than 1,000 characters, it must be the last parameter.
  • Database object names, for example tables and columns, must be enclosed in double quotation marks if they are Teradata reserved words. Otherwise, quotation marks are optional.
  • Tables can be qualified with a database name. If a table name is not qualified with a database name, the table name is resolved based on the default database for your session.
  • All parameters are passed as strings to the SAS_SCORE_EP stored procedure, so they must be enclosed in single quotation marks. To pass a single quotation mark as part of the SQL within a parameter, use two adjacent single quotation marks as shown in the following example:
    'INQUERY=select * from my_input_tbl where name like ''%Jones%''',

Teradata Scoring Files

When using the SAS Embedded Process, the %INDTD_PUBLISH_MODEL macro produces two scoring files for each model:
  • sasscore_modelname.ds2. This file contains code that is executed by the SAS_SCORE_EP stored procedure.
  • sasscore_modelname_ufmt.xml. This file contains user-defined formats for the scoring model that is being published. This file is used by the SAS_SCORE_EP stored procedure.
These files are published to the model table that you specify in the %INDTD_PUBLISH_MODEL macro. See Scoring File Examples for an example of each of these files.
A message that indicates whether the scoring files are successfully or not successfully created is printed to the SAS log.
Although you cannot view the scoring files directly, there are two ways to see the models whose files are created:
  • Log on to the database using BTEQ and submit an SQL statement. The following example assumes that the model table where the scoring files were published is register and the model name is reg1.
    bteq .logon myserver/myuserid,mypassword
    select modelname, modelowner, modelupdated from register 
       where modelname like '%reg1%';
    The model name, user ID, and date and time the model files were published are listed.
  • From SAS, use SQL procedure code that produces output in the LST file. The following example assumes that the model name that you used to create the scoring files is reg.
    proc sql noerrorstop;
      connect to teradata (user=username password=xxxx server=myserver);
      
    select * from connection to teradata
          (select modelname,modelowner,modelupdated
             from sasmodeltablename
                where modelname like '%reg%');
    disconnect teradata;
    quit;
    
    You can also use the SASTRACE and SASTRACELOC system options to generate tracing information. For more information about these system options, see the SAS System Options: Reference.

Controlling the SAS Embedded Process

The SAS Embedded Process starts when a query is submitted. It continues to run until it is manually stopped or the database is shut down.
You can check the status of the SAS Embedded Process or disable it so that no new queries can be started. Use the following commands to perform those actions.
Command
Action performed
CALL DBCEXTENSION.SERVERCONTROL ('STATUS', :A);
Provides the status of the SAS Embedded Process.
CALL DBCEXTENSION.SERVERCONTROL ('DISABLE', :A);
Stops new queries from being started. Queries that are currently running continue to run until they are complete.
CALL DBCEXTENSION.SERVERCONTROL ('ENABLE', :A);
Enables waiting and new queries to start running.