Using the SAS Embedded Process to Run Scoring Models

How to Run a Scoring Model with the SAS Embedded Process

The integration of the SAS Embedded Process and Teradata allows scoring code to run directly using the SAS Embedded Process on Teradata.
Note: The SAS Embedded Process might require a later release of Teradata than function-based scoring does. For more information, see the SAS Foundation system requirements documentation for your operating environment.
To run the scoring model using the SAS Embedded Process, follow these steps.
  1. Create a table to hold the scoring files.
    The %INDTD_CREATE_MODELTABLE macro creates a table that holds the scoring files for the model that is being published.
    For more information, see Creating a Model Table.
  2. Run the %INDTD_PUBLISH_MODEL macro.
    • With traditional model scoring, the %INDTD_PUBLISH_MODEL macro performs the following tasks using some of the files that are created by the SAS Enterprise Miner Score Code Export node: the scoring model program (score.sas file), the properties file (score.xml file), and (if the training data includes SAS user-defined formats) a format catalog.
      • translates the scoring model into the sasscore_modelname.ds2 file that is used to run scoring inside the SAS Embedded Process.
      • takes the format catalog, if available, and produces the sasscore_modelname_ufmt.xml file. This file contains user-defined formats for the scoring model that is being published.
      • uses SAS/ACCESS Interface to Teradata to insert the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files into the model table that was created using the %INDTD_CREATE_MODELTABLE macro.
    With analytic store scoring, the %INDTD_PUBLISH_MODEL macro takes the files that are created by the SAS Factory Miner HPFOREST or HPSVM components: the DS2 scoring model program (score.sas file), the analytic store file (score.sasast file), and (if the training data includes SAS user-defined formats) a format catalog, and inserts their contents into the model table that was created using the %INDHN_CREATE_MODELTABLE macro.
  3. Execute the SAS_SCORE_EP stored procedure to run the scoring model.
    For more information, see SAS_SCORE_EP Stored Procedure.

Creating a Model Table

Overview

When using the SAS Embedded Process to publish a scoring model in Teradata, you must create a table to hold the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files. You must run the %INDTD_CREATE_MODELTABLE macro to create the table before you run the %INDTD_PUBLISH_MODEL macro.
You have to create the table only one time to hold a model’s scoring files.
The model table contains the following columns. The ModelName column is the table key. The table is referenced by the two-level name model-name.model-table-name.
Column Name
Description
Specification
ModelName
contains the name of the model
VARCHAR(128) CHARACTER SET UNICODE CASESPECIFIC
ModelDS2
contains the sasscore_modelname.ds2 file
BLOB(209708800)
ModelFormats
contains the sasscore_modelname_ufmt.xml file
BLOB(209708800)
ModelOwner2
contains the name of the user who published the model
VARCHAR(128) CHARACTER SET UNICODE CASESPECIFIC
ModelUpdated2
contains the date and time that the model was published
TIMESTAMP(6)
ModelUUID1
contains the UUID of the source model
VARCHAR(36) CHARACTER SET UNICODE NOT CASESPECIFIC
Notes1
contains additional information that describes the source model
VARCHAR(512) CHARACTER SET UNICODE NOT CASESPECIFIC
1This column is for use by SAS Model Manager. If you have a model table that was created prior to SAS 9.4 and you want this column in your model table, you must run the %INDTD_CREATE_MODELTABLE macro to re-create your model table.
2This column exists in model tables that were run in SAS 9.3 and earlier releases. This column is compatible with SAS 9.4, but it is not created if you run the %INDTD_CREATE_MODELTABLE macro in SAS 9.4. The ModelUUID and Notes columns are created instead.

%INDTD_CREATE_MODELTABLE Run Process

To run the %INDTD_CREATE_MODELTABLE macro, complete the following steps:
  1. Start SAS and submit the following command in the Program Editor or Enhanced Editor:
    %let indconn = server=myserver user=myuserid password=xxxx database=mydb;
    For more information, see the INDCONN Macro Variable.
  2. Run the %INDTD_CREATE_MODELTABLE macro.
    For more information, see %INDTD_CREATE_MODELTABLE Macro Syntax.

%INDTD_CREATE_MODELTABLE Macro Syntax

Arguments

DATABASE=database-name

specifies the name of a Teradata database where the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files are held.

Default The database specified in the INDCONN macro variable or your current database
Requirement The maximum database name length is 128 characters, and it must be a valid Teradata database name.

MODELTABLE=model-table-name

specifies the name of the table that holds the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files.

Default sas_model_table
Requirement The maximum table name length is 128 characters, and it must be a valid Teradata table name.
Interaction The table name that you specify for this macro must be the same table name that is used in the %INDTD_PUBLISH_MODEL macro.
See %INDTD_PUBLISH_MODEL Macro Syntax

ACTION = CREATE | REPLACE | DROP

specifies one of the following actions that the macro performs:

CREATE

creates a new table.

Tip If the table has been previously defined and you specify ACTION=CREATE, an error is issued.

REPLACE

overwrites the current table, if a table with the same name is already registered.

Tip If you specify ACTION = REPLACE, and the current table contains sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml files, the files are deleted and an empty table is re-created.

DROP

causes all models in this table to be dropped.

Default CREATE

SAS_SCORE_EP Stored Procedure

Overview of the SAS_SCORE_EP Stored Procedure

The SAS_SCORE_EP stored procedure is the interface for running the scoring model inside Teradata with the SAS Embedded Process. The SAS_SCORE_EP stored procedure uses the files that are stored in the model table. The stored procedure parameters enable you to control the name and location of the output table, how much data is returned, and how it is returned.
The SAS_SCORE_EP stored procedure is installed in the SAS_SYSFNLIB database. To run the stored procedure, you must have the following permissions:
  • EXECUTE PROCEDURE permission on the SAS_SYSFNLIB database
  • EXECUTE FUNCTION permission on the SAS_SYSFNLIB database
  • EXECUTE FUNCTION ON SYSLIB.MonitorVirtualConfig permission on the SYSLIB.MonitorVirtualConfig function
For more information, see Teradata Permissions.

Running the SAS_SCORE_EP Stored Procedure

You can run the SAS_SCORE_EP stored procedure using explicit pass-through and PROC SQL or you can use other Teradata query tools.
Tip
Look at the SampleSQL.txt file that is produced when the %INDTD_PUBLISH_MODEL macro is successfully run. This file can be found in the output directory (OUTDIR argument) that you specify in the %INDTD_PUBLISH_MODEL macro. The SampleSQL.txt file contains basic SQL code that can be used to run your score code inside Teradata. Please note that you must modify the sample code before using it. Otherwise, the sample code returns an error.
Note: Before running the SAS_SCORE_EP stored procedure, you must create the model table with the %INDTD_CREATE_MODELTABLE macro. Then, you must publish the files to the model table with the %INDTD_PUBLISH_MODEL macro.
Here is an example using PROC SQL.
proc sql;
   connect to teradata (user=userid password=xxxx server=myserver mode=Teradata);
   execute 
      (CALL SAS_SYSFNLIB.SAS_SCORE_EP
         (
            'MODELTABLE="grotto"."sas_publish_model"',
            'MODELNAME=Intr_Tree',
            'INQUERY=SELECT * from "grotto"."score_input_table" WHERE x1 < 1.0',
            'OUTTABLE="grottov."sas_score_out1"',
            'OUTKEY=id',
            'OPTIONS='  /* can be blank or NULL if no options are needed */
         )
      ) by teradata;
   disconnect from teradata;
quit;
Note: You must specify MODE=TERADATA in your connection string.
For more information about the stored procedure parameters, see SAS_SCORE_EP Stored Procedure Syntax.

SAS_SCORE_EP Stored Procedure Syntax

SAS_SYSFNLIB.SAS_SCORE_EP
('MODELTABLE="database"."model-table-name" ',
'MODELNAME=model-name',
'INQUERY=SELECT ...'
'OUTTABLE= "output-database-name". "output-table-name" ',
'OUTKEY=column<..., column,> | NO PRIMARY INDEX,
NULL | 'OPTIONS=' | 'OPTIONS=option; <...; option;>'
);
Parameters

"database"

specifies the name of the database that contains the model table.

Default Database in the current session
Requirements The database must be the same as the one specified in the %INDTD_PUBLISH_MODEL macro’s DATABASE argument.
The maximum database name length is 128 characters, and it must be a valid Teradata database name.

"model-table-name"

specifies the name of the model table where the sasscore_modelname.ds2 and sasscore_modelname_ufmt.xml scoring files were published with the %INDTD_PUBLISH_MODEL macro.

Requirements The model name must be the same as the one specified in the %INDTD_PUBLISH_MODEL macro’s MODELNAME argument.
The maximum table name length is 128 characters, and it must be a valid Teradata table name.

model-name

specifies the name of the model.

SELECT ...

specifies a SELECT statement that defines the inputs to the SAS Embedded Process.

Range The INQUERY= parameter string can be up to 30,000 characters long.
Restrictions The maximum number of characters in the query is 30,000.
The maximum number of input and output columns is 1024.
Requirements If the query is greater than 1,000 characters, the INQUERY= parameter must be the first parameter listed in the stored procedure.
The SELECT statement must be syntactically correct SQL.
Interaction A query can reference tables or views, except if you specify the DIRECT option. If you specify the DIRECT option, the query can reference only tables.
Tips If you want to query all data from the input data without any filtering or subsetting ('INQUERY=SELECT * FROM table'), you can use the table name in the INQUERY argument. However, you must also add DIRECT=YES to the OPTIONS argument. Here is an example
...
'inquery="myDatabase"."myTableName"',
'options=direct=yes',
...
To pass single quotation marks around character literals, use two adjacent single quotation marks. This is an example.
'INQUERY=select * from my_input_tbl where name like ''%Jones%'''

"output-database-name". "output-table-name"

specifies the location of the scoring model output table.

"output-database-name"

specifies the database where the table is created or currently exists.

Default Current database
Requirement The maximum database name length is 128 characters, and it must be a valid Teradata database name.

"output-table-name"

specifies the name of the table to be created or the table that currently exists.

Requirement The maximum table name length is 128 characters, and it must be a valid Teradata table name.
Requirement The output table can already exist. If the output table already exists, scored rows are inserted into the table along with any existing rows. If the output table already exists, the output columns must match the existing table’s columns for the insert to succeed.
Interaction The output table can be a temporary table by adding VOLATILE=YES in the OPTIONS parameter. The temporary table can be used only for the duration of the SQL session where it is created.

column

specifies the column or columns that are used as the primary index for the output table.

Requirements The column must exist in the output table.
If there are multiple primary index columns, the column names must be separated by commas.
Tip Specifying the same primary index for both the input and output tables enables Teradata to avoid redistribution of data across its AMPs.

NO PRIMARY INDEX

specifies that there is no primary index for the output table and that output rows are placed on the same Teradata Access Module Processor (AMP) that ran the scoring code for the corresponding input row.

NULL | 'OPTIONS='

specifies that no options are used.

Tip You can use either 'OPTIONS=' or NULL to indicate that no options are used.
Example These two code lines are identical.
call sas_sysfnlib.sas_score_ep ('modeltable=...', modelname=...', 
'inquery=...', 'outtable=scored_output1', 'outkey=no primary index', 
null);
call sas_sysfnlib.sas_score_ep ('modeltable=...', modelname=...', 
'inquery=...', 'outtable=scored_output1', 'outkey=no primary index',
'options=');

option

specifies additional options for the stored procedure. option can be any of the following values:

CONTRACT=YES | NO

specifies whether to write the output metadata to the table specified in the OUTTABLE= parameter.

Default NO
Interaction If you specify CONTRACT=YES, the OUTKEY= parameter is ignored.
Tip The output is written to the table in the form of one row per output value with the sequence, name, data type, and length for each output.

DIRECT=YES | NO

specifies whether direct retrieve optimization is enabled.

Default NO
Interaction This option affects the stored procedure SQL generation.
Tip The direct retrieve optimization improves performance in the case where the input to the SAS Embedded Process is a table and the input query is SELECT * FROM table. When DIRECT=YES, the INQUERY= parameter is only the table name. No SELECT statement is needed.

DS2_KEEP=column-name<... column-name>

specifies the column or columns that are passed to the SAS_SCORE_EP procedure and are applied as a dynamic KEEP= statement in the sasscore_modelname.ds2 file.

Requirements If more than one column is specified, column names must be separated with spaces.
The maximum column name length is 128 characters, and it must be a valid Teradata column name.
Interaction Specify CONTRACT=YES to preview the available output columns without executing the model.

ENCODING= LATIN | UNICODE

specifies the character data encoding for the column data. This is for internationalization purposes.

Default LATIN
See SAS National Language Support (NLS): Reference Guide

EPTRACE=YES

specifies that journal messages are written to the journal table.

HASHBY=column-name<..., column-name>

specifies one or more columns to use for the HASH BY clause.

Requirements If more than one column is specified, column names must be separated with commas.
The maximum column name length is 128 characters, and it must be a valid Teradata column name.
Interaction This option affects the stored procedure SQL generation.
Note Data is redistributed by hash code to the TERADATA AMPs based on this column or columns although there is no implied ordering to the groups.

JOURNALTABLE="database-name"."journal-table-name"

specifies the name and location of a table that the stored procedure creates. This table holds any journal messages and notes from the SAS journal facility that are produced when executing the store procedure.

"database-name"

specifies the database where the journal table is created.

Default Current database
Requirement The maximum database name length is 128 characters, and it must be a valid Teradata database name.

"journal-table-name"

specifies the name of the journal table.

Requirement The maximum table name length is 128 characters, and it must be a valid Teradata table name.
Note Use a SELECT statement to retrieve the journal messages from the table after the stored procedure call is complete.

LOCALE=sas-locale

specifies set of attributes in the SAS session that reflect the language, local conventions, and culture for a geographical region.

Requirement sas-locale must be one of the five-character POSIX values (for example, fr_FR).
See SAS National Language Support (NLS): Reference Guide

ORDERBY=column-name<..., column-name>

specifies one or more columns to use for the LOCAL ORDER BY (BY groups) clause.

Requirements If more than one column is specified, column names must be separated with commas.
The maximum column name length is 128 characters, and it must be a valid Teradata column name.
Interaction This option affects the stored procedure SQL generation.

SELECT_LIST=column-name<..., column-name>

specifies the column or columns that are used in the SQL that is generated by the SAS_SCORE_EP stored procedure.

Default * (asterisk) which indicates all columns
Requirements If more than one column is specified, column names must be separated with commas.
The maximum column name length is 128 characters, and it must be a valid Teradata column name.

SQLTRACE="database-name"."table-name"

specifies the name and location of a table to hold the generated SQL code.

"database-name"

specifies the database where the journal table is created.

Default Current database
Requirement The maximum database name length is 128 characters, and it must be a valid Teradata database name.

"table-name"

specifies the name of the table.

Requirement The maximum table name length is 128 characters, and it must be a valid Teradata table name.
Tip This table is useful for stored procedure debugging or to reference later if you want to customize the SQL code that is used to call the SAS Embedded Process.

UNIQUE=YES | NO

specifies whether the primary index of the output table is unique.

Default NO

VOLATILE=YES | NO

specifies whether the output table is created as a temporary table.

Default NO
Interaction This option affects the stored procedure SQL generation.
Range The OPTIONS= parameter string can be from 0–20,000 characters long.
Requirements Each option must end with a semicolon, including the last option in the list.
If the OPTIONS= parameter string is greater than 1,000 characters, the OPTIONS= parameter must be the last one.
Note option can be blank or NULL if no options are needed.
Tip Options that are not recognized as directives to the stored procedure are passed to the SAS Embedded Process as Query Band name-value pairs. If the SAS Embedded Process does not recognize them, they are ignored. Up to ten user-defined Query Band name-value pairs can be specified in addition to the options listed here that are Query Band name-value pairs. The maximum length of the query band is 2048 characters. User-defined Query Band information is logged in Teradata Database Query Log (DBQL) that makes it useful for workload analysis and reporting.

Tips for Using the SAS_SCORE_EP Stored Procedure

  • The SAS Embedded Process for Teradata supports only ISO-8859-1 (Latin-1) encoding for table metadata. Examples of table metadata include table and column names.
  • No specific parameter order is required. However, the INQUERY parameter must be the first parameter if its string is greater than 1,000 characters. Similarly, if the OPTIONS parameter string is greater than 1,000 characters, it must be the last parameter.
  • Database object names (for example, tables and columns) must be enclosed in double quotation marks if they are Teradata reserved words. Otherwise, quotation marks are optional.
  • Tables should be qualified with a database name. If a table name is not qualified with a database name, how the table name is resolved depends on which version of Teradata is used.
  • All parameters are passed as strings to the SAS_SCORE_EP stored procedure, so they must be enclosed in single quotation marks. To pass a single quotation mark as part of the SQL within a parameter, use two adjacent single quotation marks as shown in the following example:
    'INQUERY=select * from my_input_tbl where name like ''%Jones%''',

Teradata Scoring Files

When using the SAS Embedded Process, the %INDTD_PUBLISH_MODEL macro produces two scoring files for each model:
  • sasscore_modelname.ds2. This file contains code that is executed by the SAS_SCORE_EP stored procedure.
  • sasscore_modelname_ufmt.xml. This file contains user-defined formats for the scoring model that is being published. This file is used by the SAS_SCORE_EP stored procedure.
These files are published to the model table that you specify in the %INDTD_PUBLISH_MODEL macro. See Scoring File Examples for an example of each of these files.
A message that indicates whether the scoring files are successfully or not successfully created is printed to the SAS log.
Although you cannot view the scoring files directly, there are two ways to see the models whose files are created:
  • Log on to the database using BTEQ and submit an SQL statement. The following example assumes that the model table where the scoring files were published is register and the model name is reg1.
    bteq .logon myserver/myuserid,mypassword
    select modelname, modelowner, modeluuid from register 
       where modelname like '%reg1%';
    The model name, user ID, and date and time that the model files were published are listed.
  • From SAS, use SQL procedure code that produces output in the LST file. The following example assumes that the model name that you used to create the scoring files is reg.
    proc sql noerrorstop;
      connect to teradata (user=username password=xxxx server=myserver);
      
    select * from connection to teradata
          (select modelname,modelowner,modeluuid
             from sasmodeltablename
                where modelname like '%reg%');
    disconnect teradata;
    quit;
    
    You can also use the SASTRACE and SASTRACELOC system options to generate tracing information. For more information about these system options, see the SAS System Options: Reference.

Controlling the SAS Embedded Process

The SAS Embedded Process starts when a query is submitted. It continues to run until it is manually stopped or the database is shut down.
You can check the status of the SAS Embedded Process or disable it so that no new queries can be started. Use the following commands to perform those actions.
Action Performed
Command
Provides the status of the SAS Embedded Process.
CALL SQLJ.SERVERCONTROL ('SAS', 'status', :A); 1
Stops new queries from being started. Queries that are currently running continue to run until they are complete.
CALL SQLJ.SERVERCONTROL ('SAS', 'disable', :A); 1
Enables new queries to start running.
CALL SQLJ.SERVERCONTROL ('SAS', 'enable', :A); 1
Stops the SAS Embedded Process on all Teradata nodes
CALL SQLJ.SERVERCONTROL ('SAS', 'shutdown', :A);
1Note that the Languagename parameter, 'SAS', is required and must be uppercase. The Cmd parameter (for example, 'status'), must be lowercase.
Here is the sequence of operations to stop and then restart the SAS Embedded Process:
  1. Disable the SAS Embedded Process to stop new queries from being started.
    CALL SQLJ.SERVERCONTROL ('SAS', 'disable', :A);
  2. Query the status of the SAS Embedded Process until the status returns this message: Hybrid Server is disabled with no UDFs running.
    CALL SQLJ.SERVERCONTROL ('SAS', 'status', :A);
  3. Shutdown the SAS Embedded Process.
    CALL SQLJ.SERVERCONTROL ('SAS', 'shutdown', :A);
  4. Perform maintenance on the SAS Embedded Process, for example, install a hot fix or upgrade to a new version.
  5. Enable the SAS Embedded Process.
    CALL SQLJ.SERVERCONTROL ('SAS', 'enable', :A);
  6. Test the SAS Embedded Process. The SAS Embedded Process will start when the next SAS query that uses the SAS Embedded Process is sent to the database.