Publishing Models to a Database

About Publishing Models to a Database

SAS Model Manager enables you to publish the project champion model and challenger models that are associated with the DATA Step score code type to a configured database. SAS Model Manager uses the SAS Scoring Accelerator and SAS/ACCESS interface to the database to publish models to the database. The Scoring Accelerator takes the models from SAS Model Manager and translates them into scoring files or functions that can be deployed inside the database. After the scoring functions are published using the SAS/ACCESS interface to the database, the functions extend the database’s SQL language and can be used in SQL statements such as other database functions. After the scoring files are published, they are used by the SAS Embedded Process to run the scoring model.
If the scoring function publish method is chosen, the scoring metadata tables in the database are populated with information about the project and pointers to the scoring function. This feature enables users to review descriptions and definitions of the published model. The audit logs track the history of the model's usage and any changes that are made to the scoring project.
For more information about the SAS Scoring Accelerator, see the SAS In-Database Technology page available at http://support.sas.com.

Process Flow

This is an example of the process flow to publish a scoring model to a database.
  1. Select a project and click Publish Model.
    Publish Models - DB2 Example
  2. Select a database from the Publish destination list for the project that contains the champion model or challenger model that you want to publish to a specific database.
  3. After you select the publish method and complete all the required information to publish the model to a database, SAS Model Manager establishes a connection to the database using the credentials that were entered. The publish name is validated against the target database. If the publish name is not unique, an error message is displayed.
  4. The SAS Model Manager middle-tier server then makes the user-defined formats accessible to the SAS Workspace Server. The format catalog is stored on the SAS Content Server. You can attach a portable formats file on the Versions page.
  5. When the SAS Model Manager publishing macro is called, it performs the following tasks:
    • calls the transform macro that creates a metadata XML file. This XML file is used by the model publishing macro.
    • calls the SAS model publishing macro, which creates the files that are needed to build the scoring functions or model files, and publishes the scoring functions or model files with those files to the specified database.
    • (Optional) validates scoring results by performing the following tasks:
      • creates a benchmark scoring result with the SAS Workspace Server using DATA step score code.
      • copies a scoring input data set to create an equivalent table.
        Note: The default train table that is specified in the properties of the published model is used as the scoring input data set during validation.
      • scores the model with the new scoring function or model files using the new scoring table.
      • compares scoring results.
  6. The middle-tier server parses the SAS Workspace Server logs to extract the return code.
  7. The middle-tier server updates the scoring metadata tables (for example, table project_metadata).
    Note: This step is performed only for the scoring function publish method and the metadata usage option is enabled in SAS Management Console.
  8. The middle-tier server then creates a history entry in the SAS Model Manager project history.
  9. The middle-tier server updates the project user-defined properties with the publish name that was entered in the Publish Model window.
  10. A message indicates that the scoring function or model files has been successfully created and that the scoring results have been successfully validated.
    Note: If the publishing job fails, an error message appears. You can view the workspace logs that are accessible from a folder that is created for the publish model on the Models page or in the Job History tab on the History page.

Prerequisites for Publishing to a Database

The following prerequisites must be completed before users can publish a model scoring function using the scoring function publish method, or publish a model’s scoring files using the SAS Embedded Process publish method:
  • The user must have the proper authorization to publish approved models from SAS Model Manager to the database for SAS In-Database scoring.
  • The champion model for the project must be set.
  • A predictive (classification or prediction) or segmentation model must have been selected for production scoring deployment via SAS Model Manager.
    SAS Model Manager can publish to a database only the models that are associated with the DATA step score code type. Models with a score code type of SAS Program or PMML cannot be published to a database.
    The score code component file (score.sas) is DATA step score code and is used as input by the SAS Scoring Accelerator when publishing a model to a database. When you use the scoring function publish method, some SAS language elements and syntax are not supported when you create or modify your score code. Only the SAS language elements and syntax that are required to run critical data transformations and model scoring functions are available. If you use a statement or function that is not supported, an error occurs and your model is not published to the database. For more information, see Considerations When Creating or Modifying DATA Step Score Code in SAS In-Database Products: User's Guide.
  • A database must have been configured to install scoring functions or model scoring files.
  • If the model contains user-defined formats, a file that contains the user-defined formats must be attached to the version and stored in a format catalog.
  • The following prerequisites are only for the scoring function publish method.
    • (Optional) A project user-defined property DbmsTable is defined for the default version of the project from which to publish the scoring function.
      Note: The DbmsTable property must be defined if you plan to use a scoring application or SQL code to score your model.
    • The JDBC driver must be accessible from the middle-tier server when using the scoring function publish method.
    • The scoring function metadata tables are required in the target database if the Metadata usage option is enables in SAS Management Console.

Make User-Defined Formats Available When Publishing Models to a Database

In order to publish models with user-defined formats to a database, you must make the user-defined formats available to SAS Model Manager.
To make the user-defined formats available for publishing:
  1. Translate the user-defined formats SAS catalog (formats.sas7bcat) that was created with the model into a formats.cport file.
    Here is an example:
    filename tranfile "C:\formats.cport";
    libname source "C:\myformats";
                     
    proc cport library=source file=tranfile memtype=catalog;
    run;
    quit;
    
  2. Attach the formats.cport file to the version that contains the project champion model or challenger models.
  3. Send a request to the SAS administrator and ask them to either put the user-defined formats catalog (formats.sas7bcat) in the \\SASConfigDirectory\Lev1\SASApp\SASEnvironment\SASFormats directory or add the LIBNAME definition for the formats library to the \\SASConfigDirectory\Lev1\SASApp\appserver_autoexec_usermods file.
    Here is an example of a LIBNAME definition:
    libname mylib "C:\myformats";
    options fmtsearch = (mylib.formats);
    

How to Publish Models to a Database

To publish a model to a database:
  1. Select a project and click Publish Models.
  2. Select a database from the publish destination list. Specifies the type of database to which the scoring function or model scoring files are published.
  3. Select a publish method. Specifies the method to use when publishing the scoring function or model files to the database.
  4. Select one or more models to publish from the models list.
  5. Specify a Publish Name for each model. Specifies the name to use when publishing a scoring function or model files to the database. The publish name is a user-defined value that can be modified. The SAS Embedded Process publish method uses the Publish Name as the model name to publish the model files to the database. The scoring function publish method has a system-generated Prefix and the Publish Name that makes up the scoring function name. These are used to publish the model scoring function. The prefix portion of the scoring function name is 11 characters long and is in the format of Yyymmddnnn_:
    • Y is a literal character and is fixed for all prefixes.
    • yy is the two-digit year.
    • mm is the month and ranges from 01 to 12.
    • dd is the day and ranges from 01 to 31.
    • nnn is a counter that increments by 1 each time that a scoring function completes successfully. The value can range from 001 to 999.
    • _ is the underscore that ends the prefix.
    The yymmdd value in the prefix is the GMT timestamp that identifies the date on which you published the model. An example of a function name is Y081107001_user_defined_value. Here are the naming convention requirements:
    Here are the naming convention requirements for the publish name:
    • The user-defined value is case insensitive. The maximum length of alphanumeric characters is determined by the database type and publish method that is selected. No spaces are allowed. An underscore is the only special character that can be included in the publish name.
    • The recommended maximum lengths of the publish name for the scoring function publish method are the following:
      • 19 alphanumeric characters for Teradata
      • 32 alphanumeric characters for Netezza, Greenplum, and DB2
      UNIX Specifics: The publish name (user-defined) portion of the function name in an AIX environment has a maximum length of 16 alphanumeric characters for Teradata.
    • The recommended maximum length of the publish name for the SAS Embedded Process publish method is 32 alphanumeric characters for all database types. The database types that are currently supported by SAS Model Manager are Teradata, Oracle, Greenplum, and DB2.
    The value of the publish name is validated against the target database, when the option Replace scoring files that have the same publish name is not selected for the SAS Embedded Process publish method. If the publish name is not unique, an error message is displayed.
    Note: The default format of the publish name is configured by the SAS administrator.
  6. (Optional) Select whether to Replace scoring files that have the same publish name. Specifies to replace the model scoring files that have the same publish name when you are using the SAS Embedded Process publish method. The value of the publish name is validated against the target database when this option is not selected. If the publish name is not unique, an error message is displayed.
  7. Specify an identifier to add to the database target table for each model. Specifies the value of the identifier that is added to each model in the database so that the Database administrator or other users can query the database. The default value is the project name. This option is available only for the SAS Embedded Process publish method.
  8. (Optional) Select whether to Validate scoring results. Specifies to validate the scoring results when publishing a model scoring function or model scoring files. This option creates a benchmark scoring result on the SAS Workspace Server using the DATA Step score code. The scoring input data set is used to create an equivalent database table. Scoring is performed using the new scoring function or model scoring files and database table. The scoring results are then compared. If selected, click Browse to navigate to the appropriate training table. The default train table that is specified in the properties of the published model is used by default.
  9. Specify the database settings.
    Here are the available database settings according to the publish method and database type:
    Database Settings
    SAS Embedded Process
    Scoring Function
    Server
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • SAP HANA
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Database
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Instance number
    SAP HANA
    Not applicable
    User ID
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Hadoop
    • SAP HANA
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Password
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Hadoop
    • SAP HANA
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Server user ID
    Not applicable
    DB2
    Compile database
    Not applicable
    Netezza
    Jazlib database
    Not applicable
    Netezza
    Schema
    • Oracle
    • Greenplum
    • DB2
    • SAP HANA
    • Greenplum
    • DB2
    Initial wait time (in seconds)
    Not applicable
    DB2
    FTP time out (in seconds)
    Not applicable
    DB2
    Server and port number
    Hadoop
    Not applicable
    Directory path
    Hadoop
    Not applicable
    MapReduce server and port number
    Hadoop
    Not applicable
    For a description of each database setting, see Descriptions of Database Settings.
  10. Click More Options to specify other options for the database.
    Keep scoring function if validation fails (scoring function) or Keep scoring files if validation fails (SAS Embedded Process)
    specifies to save the scoring function or model scoring files if the validation of the scoring results fails. Saving the scoring function or model scoring files is useful for debugging if validation fails.
    Sample size
    specifies the size of the sample to use for validating the scoring function or model files. The default value is 100. The maximum number of digits that are allowed is 8.
    Display detailed log messages
    provides detailed information, which includes warnings and error messages that occur when you publish a scoring function or scoring model files.
    Use model input
    specifies to use the selected model input when publishing the scoring function or model files instead of using the project input, which is the default. This is useful when the project input variables exceed the limitations for a database.
    Here are the limitations for the number of model input variables when publishing a champion model or challenger model to a database:
    Database Type
    SAS Embedded Process
    Scoring Function
    DB2
    The maximum depends on the page size of the database table space. For a 4K page size database, the limit is 500. If you have it configured for any of the larger page sizes (8K, 16K, 32K), then the limit is 1012.
    90
    Greenplum
    1660
    100
    Hadoop
    No limit
    Not applicable
    Netezza
    1600
    64
    Oracle
    1000
    Not applicable
    SAP HANA
    1000
    Not applicable
    Teradata
    If you use Teradata version 13.1 or 14.0, the maximum is 1024. If you use the SAS Embedded Process and Teradata version 14.10, the maximum is 2048.
    128
    Protected mode (Teradata only)
    specifies the mode of operation to use when publishing a model using the scoring function publish method. There are two modes of operation, protected and unprotected. You specify the mode by selecting or deselecting the Protected mode option. The default mode of operation is protected. Protected mode means that the macro code is isolated in a separate process from the Teradata database, and an error does not cause database processing to fail. You should run the Publish Scoring Function in protected mode during validation. When the model is ready for production, you can run the Publish Scoring Function in unprotected mode. You might see a significant performance advantage when you run the Publish Scoring Function in unprotected mode.
    Fenced mode (DB2 and Netezza only)
    specifies the mode of operation to use when publishing a model using the scoring function publish method. There are two modes of operation, fenced and unfenced. You specify the mode by selecting or deselecting the Fenced mode option. The default mode of operation is fenced. Fenced mode means that the macro code is isolated in a separate process from the DB2 database, and an error does not cause database processing to fail. You should run the Publish Scoring Function in fenced mode during validation. When the model is ready for production, you can run the Publish Scoring Function in unfenced mode. You might see a significant performance advantage when you run the Publish Scoring Function in unfenced mode.
  11. Click Publish.

Descriptions of Database Settings

The following are descriptions of the database settings that are used for publishing models.
Database server
specifies the name of the server where the database resides.
Database
specifies the name of the database.
User ID
specifies the user identification that is required to access the database.
Password
specifies the password that is associated with the User ID.
Server user ID (DB2 only)
specifies the user ID for SAS SFTP. This value enables you to access the machine on which you have installed the DB2 database. If you do not specify a value for Server user ID, the value of User ID is used as the user ID for SAS SFTP.
Schema(Greenplum, Oracle, and DB2)
specifies the schema name for the database. The schema name is owned by the user that is specified in the User ID field. The schema must be created by your database administrator.
Initial wait time (DB2 only)
specifies the initial wait time in seconds for SAS SFTP to parse the responses and complete the SFTP –batch file process.
Default: 15 seconds
FTP time out (DB2 only)
specifies the time-out value in seconds if SAS SFTP fails to transfer the files.
Default: 120 seconds
Compile database (Netezza only)
specifies the name of the database where the SAS_COMPILEUDF function is published.
Default: SASLIB
See Also: For more information about publishing the SAS_COMPILEUDF function, see the SAS In-Database Products: Administrator's Guide.
Jazlib database (Netezza only)
specifies the name of the database where the SAS 9.3 Formats Library for Netezza is published.
Default: SASLIB
Instance number (SAP HANA only)
specifies the instance number. Specify either the PORT= argument or the INSTANCE= argument. You can use the DATABASE= argument in the %INDHN_CREATE_MODELTABLE, %INDHN_PUBLISH_MODEL, or %INDHN_RUN_MODEL macro instead of specifying the INSTANCE= argument.
Server and port number (Hadoop only)
specifies the name of the server and the process port number.
Directory path (Hadoop only)
specifies the directory path for the server.
MapReduce server and port number (Hadoop only)
specifies the name of the server and port number where the MapReduce function resides. Scoring files are used by the Hadoop MapReduce function to run the scoring model.