Publishing Models to a Database

About Publishing Models to a Database

SAS Model Manager enables you to publish the project champion model and challenger models that are associated with the DATA Step score code type to a configured database. SAS Model Manager uses the SAS Scoring Accelerator and SAS/ACCESS interface to the database to publish models to the database. The Scoring Accelerator takes the models from SAS Model Manager and translates them into scoring files or functions that can be deployed inside the database. After the scoring functions are published using the SAS/ACCESS interface to the database, the functions extend the database’s SQL language and can be used in SQL statements such as other database functions. After the scoring files are published, they are used by the SAS Embedded Process to run the scoring model.
If the scoring function publish method is chosen, the scoring metadata tables in the database are populated with information about the project and pointers to the scoring function. This feature enables users to review descriptions and definitions of the published model. The audit logs track the history of the model's usage and any changes that are made to the scoring project.
For more information about the SAS Scoring Accelerator, see the SAS In-Database Technology page available at http://support.sas.com.
Note: For more information about the prerequisites before publishing models to a database, see Prerequisites for Publishing to a Database.
Here is a diagram that represents the relationship between SAS Model Manager and SAS Model Manager In-Database Support.
The Relationship between SAS Model Manager 12.3 and SAS Model Manager In-Database Support
SAS Model Manager 12.3 Architecture Topology Diagram
Here are descriptions of the diagram's components.
SAS Model Manager Client
The SAS Model Manager Client handles communication to and from SAS Model Manager. You use the SAS Model Manager Client to create projects and versions, import models, connect with data sources, validate models, run modeling reports, run scoring tasks, set project status, declare the champion model, and run performance tests.
SAS Model Manager Middle Tier Server
The SAS Model Manager Middle Tier Server is a collection of services that are hosted by an application server that orchestrates the communication and movement of data between all servers and components in the SAS Model Manager operational environment.
SAS Web Infrastructure Platform
The SAS Web Infrastructure Platform (or WIP) is a collection of middle tier services and applications that provides basic integration services. It is delivered as part of the Integration Technologies package. As such, all Business Intelligence applications, Data Integration applications, and SAS Solutions have access to the Web Infrastructure Platform as part of their standard product bundling.
SAS Content Server
The SAS Model Manager model repository and SAS Model Manager project tree configuration data and metadata are stored in the SAS Content Server. Communication between SAS Model Manager and the SAS Content Server uses the WebDAV communication protocol.
SAS Metadata Server
SAS Model Manager retrieves metadata about models from the SAS Metadata Server.
SAS Model Manager Server
The SAS Model Manager Server is a collection of macros on the SAS Workspace Server that generate SAS code to perform SAS Model Manager tasks.
SAS Scoring Accelerator
The SAS Scoring Accelerator creates scoring functions or model files that can be deployed inside a database. The scoring functions or model files are based on the project's champion model score code or challenger model score code.
DBMS
The relational databases in the database management system (DBMS) serve as output data sources for SAS Model Manager.
Scoring Metadata Tables
These tables contain metadata, and the tables are populated in the database when you publish a scoring function in SAS Model Manager.
Database Tables
These database tables in relational databases serve as data sources for a scoring application.

Process Flow

This is an example of the process flow to publish a scoring model to a database. For more information, see How to Publish Models to a Database.
  1. From SAS Model Manager, you select the Publish Modelsthen selectto a Database for the project that contains the champion model or challenger model that you want to publish to a specific database. For more information, see How to Publish Models to a Database.
  2. After you select the publish method and complete all the required information to publish the model to a database, SAS Model Manager establishes a connection to the database using the credentials that were entered. The publish name is validated against the target database. If the publish name is not unique, an error message is displayed.
  3. The SAS Model Manager middle-tier server then makes the user-defined formats accessible to the SAS Workspace Server. The format catalog is stored in the corresponding Resources folder.
  4. The SAS Model Manager publishing macro is called, which performs the following tasks:
    • calls the SAS Model Manager transform macro that creates a metadata XML file. This XML file is used by the model publishing macro.
    • calls the SAS model publishing macro, which creates the files that are needed to build the scoring functions or model files, and publishes the scoring functions or model files with those files to the specified database.
    • validates scoring results by performing the following tasks:
      • creates a benchmark scoring result with the SAS Workspace Server using DATA step score code.
      • copies a scoring input data set to create an equivalent table.
        Note: The default train table that is specified in the properties of the published model is used as the scoring input data set during validation.
      • scores the model with the new scoring function or model files using the new scoring table.
      • compares scoring results.
  5. The middle-tier server parses the SAS Workspace Server logs to extract the return code.
  6. The middle-tier server updates the scoring metadata tables (for example, table project_metadata). For more information see, Scoring Function Metadata Tables.
    Note: This step is performed only for the scoring function publish method and the metadata usage option is enabled in SAS Management Console.
  7. The middle-tier server then creates a history entry in the SAS Model Manager project history.
  8. The middle-tier server updates the project user-defined properties with the publish name that was entered in the Publish Models to a Database window.
    Note: For more information about the user-defined properties that are created when publishing, see SAS User-Defined Properties.
  9. A message indicates that the scoring function or model files has been successfully created and that the scoring results have been successfully validated.
    Note: If the publishing job fails, an error message appears. Users can view the workspace logs that are accessible from a folder that is created for the publish model in the Publish Results folder in the Project Tree.

Prerequisites for Publishing to a Database

The following prerequisites must be completed before users can publish a model scoring function using the scoring function publish method, or publish a model’s scoring files using the SAS Embedded Process publish method:
  • The user must have the proper authorization to publish approved models from SAS Model Manager to the database for SAS In-Database scoring.
  • The champion model for the project must be set.
  • A predictive (classification or prediction) or segmentation model must have been selected for production scoring deployment via SAS Model Manager.
    Note: SAS Model Manager can publish only the models that are associated with the DATA step score code type to a database. Models with a score code type of SAS Program or PMML cannot be published to a database.
  • A database must have been configured to install scoring functions or model scoring files.
  • If the model contains user-defined formats, a file that contains the user-defined formats must be attached to the Resources folder.
  • The following prerequisites are only for the scoring function publish method.
    • (Optional) A project user-defined property DbmsTable is defined for the default version of the SAS Model Manager project from which to publish the scoring function.
      Note: The DbmsTable property must be defined if you plan to use a scoring application or SQL code to score your model.
    • The JDBC driver must be accessible from the SAS Model Manager middle-tier server when using the scoring function publish method.
    • The scoring function metadata tables are required in the target database if the Metadata usage option is enables in SAS Management Console.

Make User-Defined Formats Available When Publishing Models to a Database

In order to publish models with user-defined formats to a database using the Publish Models to a Database feature, you must make the user-defined formats available to SAS Model Manager.
To make the user-defined formats available for publishing:
  1. Translate the user-defined formats SAS data set (formats.sas7bcat) that was created with the model into a formats.cport file.
    Here is an example:
    filename tranfile "C:\formats.cport";
    libname source "C:\myformats";
                     
    proc cport library=source file=tranfile memtype=catalog;
    run;
    quit;
    
  2. Attach the formats.cport file to the Resources folder within the version that contains the project champion model or challenger models.
  3. Send a request to the SAS administrator and ask them to either put the user-defined formats SAS data set (formats.sas7bcat) in the \\SASConfigDirectory\Lev1\SASApp\SASEnvironment\SASFormats directory or add the LIBNAME definition for the formats library to the \\SASConfigDirectory\Lev1\SASApp\appserver_autoexec_usermods file.
    Here is an example of a LIBNAME definition:
    libname mylib "C:\myformats";
    options fmtsearch = (mylib.formats);
    

Publish Models to a Database Field Descriptions

Here is a list of the field names and descriptions for the Publish Models to a Database window.
Publish Models to a Database window
Database type
specifies the type of database to which the scoring function or model scoring files are published.
Publish method
specifies the method to use when publishing the scoring function or model files to the database.
Publish name
specifies the name to use when publishing a scoring function or model files to the database. The publish name is a user-defined value that can be modified. The SAS Embedded Process publish method uses the publish name as the model name to publish the model files to the database. The scoring function publish method has a system-generated prefix and the publish name that makes up the scoring function name. These are used to publish the model scoring function.
The prefix portion of the scoring function name is 11 characters long and is in the format of Yyymmddnnn_.
  • Y is a literal character and is fixed for all prefixes.
  • yy is the two-digit year.
  • mm is the month and ranges from 01 to 12.
  • dd is the day and ranges from 01 to 31.
  • nnn is a counter that increments by 1 each time that a scoring function completes successfully. The value can range from 001 to 999.
  • _ is the underscore that ends the prefix.
The yymmdd value in the prefix is the GMT timestamp that identifies the date on which you selected the Publish Modelsthen selectto a Database menu option. An example of a function name is Y081107001_user_defined_value. Here are the naming convention requirements:
Here are the naming convention requirements for the publish name:
  • The user-defined value is case insensitive. The maximum length of alphanumeric characters is determined by the database type and publish method that is selected. No spaces are allowed. An underscore is the only special character that can be included in the publish name.
  • The recommended maximum lengths of the publish name for the scoring function publish method are the following:
    • 19 alphanumeric characters for Teradata
    • 32 alphanumeric characters for Netezza, Greenplum, and DB2
    UNIX Specifics: The publish name (user-defined) portion of the function name in an AIX environment has a maximum length of 16 alphanumeric characters for Teradata.
  • The recommended maximum length of the publish name for the SAS Embedded Process publish method is 32 alphanumeric characters for all database types. The database types that are currently supported by SAS Model Manager are Teradata, Oracle, Greenplum, and DB2.
The value of the publish name is validated against the target database, when the option Replace scoring files that have the same publish name is not selected for the SAS Embedded Process publish method. If the publish name is not unique, an error message is displayed.
Replace scoring files that have the same publish name
specifies to replace the model scoring files that have the same publish name when you are using the SAS Embedded Process publish method. The value of the publish name is validated against the target database when this option is not selected. If the publish name is not unique, an error message is displayed.
Specify an identifier to add to the database table for each model
specifies the value of the identifier that is added to each model in the database so that the Database administrator or other users can query the database. The default value is the project name. This option is available only for the SAS Embedded Process publish method.
Database server
specifies the name of the server where the database resides.
Database
specifies the name of the database.
User ID
specifies the user identification that is required to access the database.
Password
specifies the password that is associated with the User ID.
Server user ID (DB2 only)
specifies the user ID for SAS SFTP. This value enables you to access the machine on which you have installed the DB2 database. If you do not specify a value for Server user ID, the value of User ID is used as the user ID for SAS SFTP.
Schema(Greenplum, Oracle, and DB2)
specifies the schema name for the database. The schema name is owned by the user that is specified in the User ID field. The schema must be created by your database administrator.
Initial wait time (DB2 only)
specifies the initial wait time in seconds for SAS SFTP to parse the responses and complete the SFTP –batch file process.
Default: 15 seconds
FTP time out (DB2 only)
specifies the time-out value in seconds if SAS SFTP fails to transfer the files.
Default: 120 seconds
Compile database (Netezza only)
specifies the name of the database where the SAS_COMPILEUDF function is published.
Default: SASLIB
See Also: For more information about publishing the SAS_COMPILEUDF function, see the SAS In-Database Products: Administrator's Guide.
Jazlib database (Netezza only)
specifies the name of the database where the SAS 9.3 Formats Library for Netezza is published.
Default: SASLIB
Options
Validate scoring results
specifies to validate the scoring results when publishing a model scoring function or model scoring files. This option creates a benchmark scoring result on the SAS Workspace Server using the DATA Step score code. The scoring input data set is used to create an equivalent database table. Scoring is performed using the new scoring function or model scoring files and database table. The scoring results are then compared.
Note: The default training table is used as the scoring input data set during validation.
Keep scoring function if validation fails(scoring function) or Keep scoring files if validation fails (SAS Embedded Process)
specifies to save the scoring function or model scoring files if the validation of the scoring results fails. Saving the scoring function or model scoring files is useful for debugging if validation fails.
Use model input
specifies to use the selected model input when publishing the scoring function or model files instead of using the project input, which is the default. This is useful when the project input variables exceed the limitations for a database.
Here are the limitations for the number of model input variables when publishing a champion model or challenger model to a database:
Database Type
SAS Embedded Process
Scoring Function
Teradata
If you use Teradata version 13.1 or 14.0, the maximum is 1024. If you use the SAS Embedded Process and Teradata version 14.10, the maximum is 2048.
128
Netezza
1600
64
Oracle
1000
Not applicable
Greenplum
1660
100
DB2
The maximum depends on the page size of the database table space. For a 4K page size database, the limit is 500. If you have it configured for any of the larger page sizes (8K, 16K, 32K), then the limit is 1012.
90
Protected mode (Teradata only)
specifies the mode of operation to use when publishing a model using the scoring function publish method. There are two modes of operation, protected and unprotected. You specify the mode by selecting or deselecting the Protected mode option. The default mode of operation is protected. Protected mode means that the macro code is isolated in a separate process from the Teradata database, and an error does not cause database processing to fail. You should run the Publish Scoring Function in protected mode during validation. When the model is ready for production, you can run the Publish Scoring Function in unprotected mode. You might see a significant performance advantage when you run the Publish Scoring Function in unprotected mode.
Fenced mode (DB2 and Netezza only)
specifies the mode of operation to use when publishing a model using the scoring function publish method. There are two modes of operation, fenced and unfenced. You specify the mode by selecting or deselecting the Fenced mode option. The default mode of operation is fenced. Fenced mode means that the macro code is isolated in a separate process from the DB2 database, and an error does not cause database processing to fail. You should run the Publish Scoring Function in fenced mode during validation. When the model is ready for production, you can run the Publish Scoring Function in unfenced mode. You might see a significant performance advantage when you run the Publish Scoring Function in unfenced mode.
Display detailed log messages
provides detailed information, which includes warnings and error messages that occur when you publish a scoring function or scoring model files.
Sample size
specifies the size of the sample to use for validating the scoring function or model files. The default value is 100. The maximum number of digits that are allowed is 8.

How to Publish Models to a Database

To publish a model to a database:
  1. Verify that you have set the champion model and challenger models that you want to publish. For more information, see Set a Champion Model.
  2. (Optional) Select the project name and enter a value for the DbmsTable user-defined property.
    Note: If you plan to use scoring application or SQL Code to score this project, you can set the DbmsTable property to the name of input table in your database that you want to use for scoring the champion model. When you publish a scoring function or model files, the information that is associated with the input table in the database is updated to contain the value of the DbmsTable property. The scoring application or SQL code can then query the database for the input table name to use as the scoring input table.
    For more information, see User-Defined Properties.
    DBMS User-Defined Property
  3. Right-click the project's name in the Project Tree and select Publish Modelsthen selectto a Database from the pop-up menu. The Publish Models to a Database window appears.
    Publish Models to a Database Window
  4. Select a database type and publish method. The type of database and the publish method that you choose determine which database settings and options are required.
    Operating Environment Information: The SAS Embedded Process can be used with the SAS Scoring Accelerator for Netezza to run scoring models with the release of SAS 9.4. The SAS Administrator can enable Netezza support for SAS Model Manager so that the Netezza database type appears when using the SAS Embedded Process publish method. For more information, see the SAS Model Manager: Administrator's Guide.
  5. Select the check box next to the models that you want to publish.
  6. Enter a publish name for each model that you selected to publish. The scoring function publish method has a system-generated prefix and the publish name. These are used to publish the model scoring function. The SAS Embedded Process publish method uses only the publish name to publish the model files to the database. The publish name is a user-defined value that can be modified.
    Here are the naming convention requirements:
    • The user-defined value is case insensitive. The maximum length of alphanumeric characters is determined by the database type and publish method that is selected. No spaces are allowed. An underscore is the only special character that can be included in the publish name.
    • The recommended maximum lengths of the publish names for the scoring function publish method are the following:
      • 19 alphanumeric characters for Teradata
      • 32 alphanumeric characters for Netezza, Greenplum, and DB2
      UNIX Specifics: The publish name (user-defined) portion of the function name in an AIX environment has a maximum length of 16 alphanumeric characters for Teradata.
    • The maximum length of the publish name for the SAS Embedded Process publish method is 128 alphanumeric characters for all databases. However, the recommended maximum length of the publish name is 32 alphanumeric characters for all database types. The database types that are currently supported by SAS Model Manager are Teradata, Oracle, Netezza, Greenplum, and DB2.
    Note: The publish name for each model is reserved by default for subsequent use of the publishing models for a project.
  7. (Optional) Select the replace scoring files that have the same publish name check box. This option is available only for the SAS Embedded Process publish method.
  8. Specify an identifier to add to the database target table for each model. The default value is the project name. This option is available only for the SAS Embedded Process publish method.
  9. Enter a value for the database settings that appear for the selected database type.
    Here are the available database settings according to the publish method and database type:
    Database Settings
    SAS Embedded Process
    Scoring Function
    Database server
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Database
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    User ID
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Password
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Server user ID
    Not applicable
    DB2
    Compile database
    Not applicable
    Netezza
    Jazlib database
    Not applicable
    Netezza
    Schema
    • Oracle
    • Greenplum
    • DB2
    • Greenplum
    • DB2
    Initial wait time (in seconds)
    Not applicable
    DB2
    FTP time out (in seconds)
    Not applicable
    DB2
  10. Click More Options. The Database Options window appears.
    Database Options window
    Select the check box for the desired validation options that appear for the selected database type:
    • Validate scoring results
    • Keep scoring files if validation fails (SAS Embedded Process) or Keep scoring function if validation fails (scoring function)
    • Display detailed log messages
    • Use model input
    • Protected mode (Teradata scoring function option) or Fenced mode (DB2 and Netezza scoring function option)
    Note: By default, the Validate scoring results and Use model input options are selected for both publish methods. The Protected mode or the Fenced mode options are selected by default for the scoring function publish method.
  11. Enter a numeric value for Sample Size. The default value for sample size is 100 if the value is null or zero. The maximum number of digits that are allowed is 8.
    Publish Scoring Function Window
  12. Click OK. A message is displayed to indicate whether the models were published to the database successfully or not.
    Publish Name Verification Message
    Note: The value of the publish name is validated against the target database, when the option Replace scoring files that have the same publish name is not selected for the SAS Embedded Process publish method. If the publish name is not unique, an error message is displayed.
    Publish Name Validation Error Message
  13. Click Close to complete the publishing process.

Log Messages

A user can view the log file after publishing a scoring model to a database. The Publish Results folder in the Project Tree contains a folder for each model that was published. The publish name is used to create a folder for each model that is published. That folder contains the ScoringResults.log file. The time at which the process started, details about who initiated the process, and the time at which the project was published are recorded. Error messages are also recorded in the log file. The log file provides an audit trail of all relevant actions in the publishing process.
Publish to Database Results Log

Scoring Function Metadata Tables

If the metadata tables are created and configured for use in SAS Management Console, the following tables are populated in the database when you are publishing a scoring function:
project_metadata
provides information about the scoring project.
model_metadata
provides information about the champion model, such as the function name and signature that are stored within this table.
project_model_info
maps a project to the champion model, as well as provides information about the current active scoring model.
update_log
provides information about the update service process such as when it was started, errors that occurred, and maintenance messages about what has occurred during the publishing process. It also serves as the main audit trail.
rollback_error_log
records fatal errors that occur in the publishing process.