Publish Scoring Functions

What Is a Scoring Function?

The Publish Scoring Function of SAS Model Manager enables you to publish models that are associated with the Data Step score code type to a configured database. When you publish a scoring function for a project, SAS Model Manager exports the project's champion model to the SAS Metadata Repository. The SAS Scoring Accelerator then creates scoring functions in the default version that can be deployed inside the database based on the project's champion model score code. The scoring function is validated automatically against a default train table to ensure that the scoring results are correct. A scoring application or SQL code can then execute the scoring functions in the database. The scoring functions extend the database's SQL language and can be used in SQL statements like other database functions.

The Scoring Function metadata tables are populated with information about the project and pointers to the scoring function. This feature enables users to review descriptions and definitions of the exported model. The audit logs track the history of the model's usage and any changes that are made to the scoring project.

The Publish Scoring Function also creates a MiningResult metadata object that is stored in the SAS Metadata Repository. A typical use of a MiningResult object is to serve input and output metadata queries made by scoring applications at the design time of application development.

For more information about the SAS Scoring Accelerator, see the SAS In-Database Technology page on http://support.sas.com.

Note: For more information about the prerequisites before publishing a scoring function, see Prerequisites for Publishing a Scoring Function.

Here is a diagram that represents the relationship between SAS Model Manager and SAS Model Manager In-Database Support.

The Relationship between SAS Model Manager 3.1 and SAS Model Manager In-Database Support

SAS Model Manager 3.1 Architecture Topology Diagram

Here are descriptions of the diagram's components.

SAS Model Manager Client

The SAS Model Manager Client handles communication to and from SAS Model Manager. You use the SAS Model Manager Client to create projects and versions, import models, connect with data sources, validate models, run modeling reports, run scoring tasks, set project status, declare the champion model, and run performance tests.

SAS Model Manager Middle Tier Server

The SAS Model Manager Middle Tier Server is a collection of services that are hosted by an application server that orchestrates the communication and movement of data between all servers and components in the SAS Model Manager operational environment.

SAS Web Infrastructure Platform

The SAS Web Infrastructure Platform (or WIP) is a collection of middle tier services and applications that provides basic integration services. It is delivered as part of the Integration Technologies package. As such, all Business Intelligence applications, Data Integration applications, and SAS Solutions have access to the Web Infrastructure Platform as part of their standard product bundling.

SAS Content Server

The SAS Model Manager model repository and SAS Model Manager window tree configuration data and metadata are stored in the SAS Content Server. Communication between SAS Model Manager and the SAS Content Server uses the WebDAV communication protocol.

SAS Metadata Server

SAS Model Manager retrieves metadata about models from the SAS Metadata Server.

SAS Model Manager Server

The SAS Model Manager Server is a collection of macros on the SAS Workspace Server that generate SAS code to perform SAS Model Manager tasks.

SAS Scoring Accelerator

The SAS Scoring Accelerator creates scoring functions that can be deployed inside a database. Scoring functions are based on the project's champion model score code.

DBMS

The relational databases in the database management system (DBMS) serve as output data sources for SAS Model Manager.

Scoring Metadata Tables

These tables contain metadata, and the tables serve as data sources in SAS Model Manager.

Database Tables

These database tables in relational databases serve as data sources for a scoring application.

Scoring Function Process Flow

This is an example of the process flow to publish a scoring function. For more information, see How to Publish a Scoring Function.

From SAS Model Manager, you select the Publish Scoring Function for the project that contains the champion model that they want to publish to a specific database. For more information, see How to Publish a Scoring Function.
After you complete all the required information about the Publish Scoring Function, SAS Model Manager establishes a JDBC connection to the database using the credentials that were entered. The user-defined part of the scoring function name is validated against the target database. If the user-defined part of the function name is not unique, an error message is displayed.
If the scoring function name is validated successfully, the SAS Model Manager middle-tier server creates a MiningResult metadata object based on the champion model and exports that MiningResult metadata object to the folder that was specified in the Publish Scoring Function window.
The SAS Model Manager middle-tier server then makes the user-defined formats accessible to the SAS Workspace Server. The format catalog is stored in the corresponding Resources folder.
The SAS Model Manager publishing macro is called, which performs the following tasks:
- calls the SAS Model Manager transform macro that creates a metadata XML file. This XML file is used by the model publishing macro.
- calls the SAS model publishing macro, which creates the files that are needed to build the scoring functions and publishes the scoring functions with those files to the specified database.
- validates scoring results by performing the following tasks:
  
  creates a benchmark scoring result with the SAS Workspace Server using DATA step score code.
  
  copies a scoring input data set to create an equivalent table.
  
  Note: The default train table that is specified in the properties of the version that contains the champion model is used as the scoring input data set during validation.
  
  scores the model with the new scoring function using the new scoring table.
  
  compares scoring results.
The middle-tier server parses the SAS Workspace Server logs to extract the return code.
The middle-tier server updates the scoring function metadata tables (for example, table project_metadata). For more information see, Scoring Function Metadata Tables.
The middle-tier server then creates a history entry in the SAS Model Manager project history.

Note: A history entry is always created whether the publishing job is completed successfully or not.
The middle-tier server updates the project user-defined properties with the function name that was entered in the Publish Scoring Function window.

Note: The user-defined part of the function name is used as the default scoring function name in future scoring function publishing from the same SAS Model Manager project.
A message indicates that the scoring function has been successfully created and that the scoring results have been successfully validated.

Note: If the publishing job fails, an error message appears. Users can view the workspace logs that are accessible from the message box. If the scoring function is not published successfully, then the previous scoring function for the same project is used in subsequent scoring, based on the SAS Model Manager Java Scoring API.

Prerequisites for Publishing a Scoring Function

The following prerequisites must be completed before users can publish a scoring function:

The user must have the proper authorization to publish approved models from SAS Model Manager to the database for SAS In-Database scoring.
The default scoring version for the project and the champion model for the default version must be set.
A predictive (classification or prediction) or segmentation model must have been selected for production scoring deployment via SAS Model Manager.

Note: SAS Model Manager cannot publish scoring functions for PMML models. A model is supported when the score code is a SAS DATA step fragment and has been created by SAS Enterprise Miner.
A SAS metadata folder must have been created for models that are to be used for SAS In-Database scoring. The folder name is created in SAS Management Console.

Note: the SAS metadata folder is created via SAS Management Console.
(Optional) A project user-defined property DbmsTable is defined for the default version of the SAS Model Manager project from which to publish the scoring function.

Note: The DbmsTable property must be defined if you plan to use a scoring application or SQL code to score your model.
The JDBC driver must be accessible from the SAS Model Manager middle-tier server.
A database must have been configured to install scoring functions.
(Optional) The user might have contacted the scoring application development team about what scoring function name should be used for the SAS Model Manager project that contains the approved model.

Publish Scoring Function Field Descriptions

Here is a list of the field names and descriptions for the Publish Scoring Function window.

SAS metadata location

specifies a folder location in the SAS Metadata Repository.

Function name

specifies the name of the scoring function, which includes a prefix and a user-defined value. The prefix is 11 characters long and is in the format of Yyymmddnnn_.

Y is a literal character and is fixed for all prefixes.
yy is the two-digit year.
mm is the month and ranges from 01 to 12.
dd is the day and ranges from 01 to 31.
nnn is a counter that increments by 1 each time that a scoring function completes successfully. The value can range from 001 to 999.
_ is the underscore that ends the prefix.

The yymmdd value in the prefix is the GMT timestamp that identifies the date when you selected the Publish Scoring Function menu option. An example of a function name is Y081107001_user_defined_value. Here are the naming convention requirements:

The user-defined value is case insensitive. The maximum length of alphanumeric characters is determined by the database type that is selected. No spaces are allowed. An underscore is the only special character that can be included in the function name.
- 19 alphanumeric characters for Teradata
- 117 alphanumeric characters for Netezza and DB2
- 52 alphanumeric characters for Greenplum
UNIX Specifics: The user-defined portion of the function name in an AIX environment has a maximum length of 16 alphanumeric characters for Teradata.

The user-defined value part of the scoring function name is used by default for subsequent use of the Publish Scoring Function from the same project. Only the user-defined value in the scoring function name can be modified.

Database type

specifies the type of database that the scoring function will be published to.

Database server

specifies the name of the server where the database resides.

Database

specifies the name of the database.

User ID

specifies the user identification that is required to access the database.

Password

specifies the password that is associated with the User ID.

Server user ID (DB2 only)

specifies the user ID for SAS SFTP. This value enables you to access the machine on which you have installed the DB2 database. If you do not specify a value for Server user ID, the value of User ID is used as the user ID for SAS SFTP.

Schema(Greenplum only)

specifies the schema name for the database. The schema name is owned by the user that is specified in the User ID field. The schema must be created by your database administrator.

Initial wait time (DB2 only)

specifies the initial wait time in seconds for SAS SFTP to parse the responses and complete the SFTP –batch file process.

Default: 15 seconds

FTP time out (DB2 only)

specifies the time-out value in seconds if SAS SFTP fails to transfer the files.

Default: 120 seconds

Compile database (Netezza only)

specifies the name of the database where the SAS_COMPILEUDF function is published.

Default: SASLIB

See Also: For more information about publishing the SAS_COMPILEUDF function, see the SAS In-Database Products: Administrator's Guide.

Jazlib database (Netezza only)

specifies the name of the database where the SAS 9.3 Formats Library for Netezza is published.

Default: SASLIB

Restriction: This argument is supported only for TwinFin systems.

Options

Validate scoring results

specifies to validate the scoring results when publishing the scoring function. This option creates a benchmark scoring result on the SAS Workspace Server using the DATA Step score code. The scoring input data set is used to create an equivalent database table. Scoring is performed using the new scoring function and database table. The scoring results are then compared.

Note: The default training table is used as the scoring input data set during validation.

Keep scoring function if validation fails

specifies to save the scoring function if the validation of the scoring results fails. Saving the scoring function is useful for debugging if the scoring function validation fails.

Use champion model input

specifies to use the champion model input when publishing the scoring function instead of using the project input, which is the default. This is useful when the project input variables exceed the limitations for a database.

Note: Here are the limitations for the number of project input variables when publishing a scoring function for a champion model to a database:

128 input variables for Teradata
64 input variables for Netezza
100 input variables for Greenplum
90 input variables for DB2

Protected mode (Teradata only)

specifies the mode of operation to use for the Publish Scoring Function. There are two modes of operation, protected and unprotected. You specify the mode by selecting or deselecting the Protected mode option. The default mode of operation is protected. Protected mode means that the macro code is isolated in a separate process from the Teradata database, and an error does not cause database processing to fail. You should run the Publish Scoring Function in protected mode during validation. When the model is ready for production, you can run the Publish Scoring Function in unprotected mode. You might see a significant performance advantage when you run the Publish Scoring Function in unprotected mode.

Fenced mode (DB2 only)

specifies the mode of operation to use for the Publish Scoring Function. There are two modes of operation, fenced and unfenced. You specify the mode by selecting or deselecting the Fenced mode option. The default mode of operation is fenced. Fenced mode means that the macro code is isolated in a separate process from the DB2 database, and an error does not cause database processing to fail. You should run the Publish Scoring Function in fenced mode during validation. When the model is ready for production, you can run the Publish Scoring Function in unfenced mode. You might see a significant performance advantage when you run the Publish Scoring Function in unfenced mode.

Display detailed log messages

provides detailed information, which includes warnings and error messages that occur when you publish a scoring function.

Sample size

specifies the size of the sample to use for validating the scoring function. The default value is 100. The maximum number of digits that are allowed is 8.

How to Publish a Scoring Function

To publish a scoring function, follow these steps:

Verify that you have set the default scoring version for the project and have set the champion model for the default version. For more information, see Set a Default Version or Set a Champion Model.
(Optional) Select the project name and enter a value for the DbmsTable user-defined property.

Note: If you plan to use scoring application or SQL Code to score this project, you must first set the DbmsTable property to the name of input table in your database that you want to use for scoring the champion model. When you publish a scoring function the information that is associated with the input table in the database is updated to contain the value of the DbmsTable property. The scoring application or SQL code can then query the database for the input table name to use as the scoring input table.

For more information, see User-Defined Properties.
Right-click the project's name in the Project Tree and select Publish Scoring Function. The Publish Scoring Function window appears.
To select a SAS metadata Location in which to publish the model, click Browse, select a folder name, and then click OK.
Enter a name for the scoring function. The function name includes a prefix and a user-defined value. Only the user-defined value can be modified. The naming conventions are the following:
- The user-defined value must be unique across all projects.
- The maximum length for the user-defined value is determined by which database type is selected, and no spaces are allowed. It is recommended that you limit the user-defined value to 20 characters or fewer. Here are the limitations for the number of characters for each database:
  
  19 alphanumeric characters for Teradata
  
  117 alphanumeric characters for Netezza and DB2
  
  52 alphanumeric characters for Greenplum
- The only special character that can be included in the function name is an underscore.
Note: The user-defined value portion of the scoring function name is used by default for subsequent use of the scoring function from the same project. For more information, see User-Defined Properties.
Select a database type from the drop-down list. The type of database that you choose determines which fields are required.
Enter a text value for the following fields:
- Database server
- Database
- User ID
- Password
- Server user ID (DB2 only)
- Schema (Greenplum only)
- Compile database (Netezza only)
- Jazlib database (Netezza only)
(DB2 only) Enter a numeric value for the fields that appear for the database type:
- Initial wait time (in seconds)
- FTP time out (in seconds)
In the Options section, select the check box for the desired validation options that appear for the selected database type:
- Validate scoring results
- Keep scoring function if validation fails
- Use champion model input
- Protected mode (Teradata only) or Fenced mode (DB2 only)
- Display detailed log messages
Note: By default, the Validate scoring results, and Protected mode or Fenced mode options are selected. The Keep scoring function if validation fails option is available for selection only when Validate scoring results is selected.
Enter a numeric value for Sample Size. The default value for sample size is 100 if the value is null or zero. The maximum number of digits that are allowed is 8.
Click OK. A message is displayed that contains the scoring function name that will be published to the database.

Note: The user-defined value of the Function Name is validated against the target database. If the user-defined value has already being used by another project to publish a model to the target database, an error message is displayed.
Click OK. A message indicating that the scoring function has been created successfully or unsuccessfully is displayed.
Click Details to display information about the publish scoring function actions or error messages.
Click OK to complete the publishing process.

Log Messages

A user can view the log file in the Details window when publishing a scoring function. As an alternative, a user can view the log file in the ScoringFunction.log file in the Project Tree. The log file is named ScoringFunction.log. The time that the process started, who initiated the process, and when the project was published are recorded. Error messages are also recorded in the log file. The log file provides an audit trail of all relevant actions in the publishing process.

Scoring Function Metadata Tables

The following tables are created in the database when you are publishing a scoring function:

project_metadata

provides information about the scoring project.

model_metadata

provides information about the champion model, such as the function name and signature that are stored within this table.

project_model_info

maps a project to the champion model, as well as provides information about the current active scoring model.

update_log

provides information about the update service process such as when it was started, errors that occurred, and maintenance messages about what has occurred during the publishing process. It also serves as the main audit trail.

rollback_error_log

records fatal errors that occur in the publishing process.