Publishing Models to a Database or Hadoop

About Publishing Models to a Database or Hadoop

SAS Model Manager enables you to publish the project champion model and challenger models that are associated with the DATA Step score code type to a configured database. SAS Model Manager uses the SAS Scoring Accelerator and SAS/ACCESS interface to the database to publish models to the database or Hadoop Distributed File System (HDFS). The Scoring Accelerator takes the models from SAS Model Manager and translates them into scoring files or functions that can be deployed inside the database. After the scoring functions are published using the SAS/ACCESS interface to the database, the functions extend the database’s SQL language and can be used in SQL statements such as other database functions. After the scoring files are published, they are used by the SAS Embedded Process to run the scoring model.
If the scoring function publish method is chosen, the scoring metadata tables in the database are populated with information about the project and pointers to the scoring function. This feature enables users to review descriptions and definitions of the published model. The audit logs track the history of the model's usage and any changes that are made to the scoring project.
For more information about the SAS Scoring Accelerator, see the SAS In-Database Technology page available at http://support.sas.com.

Prerequisites for Publishing to a Database or Hadoop

The following prerequisites must be completed before users can publish a model scoring function using the scoring function publish method, or publish a model’s scoring files using the SAS Embedded Process publish method:
  • The user must have the proper authorization to publish approved models from SAS Model Manager to the database or HDFS for SAS In-Database scoring.
    Note: When using the SAS Embedded Process publish method to publish models from SAS Model Manager to a Teradata database, you must have a user database with the appropriate user permissions in order to validate the scoring results. For more information, see Teradata Permissions for Publishing Formats and Scoring Models in SAS In-Database Products: Administrator’s Guide.
    Note: If your system is configured for Kerberos authentication, each user must have a valid Kerberos ticket. Also, you must complete post-installation configuration steps to enable users to publish models from the SAS Model Manager application. For more information, see Configure Users Authenticated by Kerberos for Publishing Models in SAS Model Manager: Administrator’s Guide.
  • The champion model for the project must be set.
  • A predictive (classification or prediction), analytical, or segmentation model must have been selected for production scoring deployment via SAS Model Manager.
    SAS Model Manager can publish models to a database or Hadoop when they are associated with the DATA step score code type. Models that have a score code type of Analytic store can be published only to Hadoop and Teradata. Models that have a score code type of SAS Program, PMML, or DS2 cannot be published to a database.
    The score code component file (score.sas) is DATA step score code and is used as input by the SAS Scoring Accelerator when publishing a model to a database. When you use the scoring function publish method, some SAS language elements and syntax are not supported when you create or modify your score code. Only the SAS language elements and syntax that are required to run critical data transformations and model scoring functions are available. If you use a statement or function that is not supported, an error occurs and your model is not published to the database. For more information, see Considerations When Creating or Modifying DATA Step Score Code in SAS In-Database Products: User’s Guide.
    The SAS Scoring Accelerator requires three files when you are publishing a SAS analytic store model. The files are score.sas, score.sasast, and score.xml. The score.xml file is generated at publish time by SAS Model Manager. For more information, see Introduction to Analytic Store Scoring in SAS In-Database Products: User’s Guide.
  • A database must have been configured to install scoring functions or model scoring files.
  • If the model contains user-defined formats, a file that contains the user-defined formats must be attached to the version and stored in a format catalog.
  • The following prerequisites are only for the scoring function publish method.
    • (Optional) A project user-defined property DbmsTable is defined for the champion version of the project from which to publish the scoring function.
      Note: The DbmsTable property must be defined if you plan to use a scoring application or SQL code to score your model.
    • The JDBC driver must be accessible from the middle-tier server when using the scoring function publish method.
    • The scoring function metadata tables are required in the target database if the Metadata usage option is enables in SAS Management Console.

Make User-Defined Formats Available When Publishing Models to a Database

In order to publish models with user-defined formats to a database, you must make the user-defined formats available to SAS Model Manager.
To make the user-defined formats available for publishing:
  1. Translate the user-defined formats SAS catalog (formats.sas7bcat) that was created with the model into a formats.cport file.
    Here is an example:
    filename tranfile "C:\formats.cport";
    libname source "C:\myformats";
                     
    proc cport library=source file=tranfile memtype=catalog;
    run;
    quit;
    
  2. Attach the formats.cport file to the version that contains the project champion model or challenger models. For more information, see Attach a Portable Formats File.
  3. Send a request to the SAS administrator and ask them to either put the user-defined formats catalog (formats.sas7bcat) in the \\SASConfigDirectory\Lev1\SASApp\SASEnvironment\SASFormats directory or add the LIBNAME definition for the formats library to the \\SASConfigDirectory\Lev1\SASApp\appserver_autoexec_usermods file.
    Here is an example of a LIBNAME definition:
    libname mylib "C:\myformats";
    options fmtsearch = (mylib.formats);
    

How to Publish Models to a Database or Hadoop

To publish a model to a database:
  1. Select a project and click Publish Models.
    Publish Models - Teradata Example
  2. Select a publish destination from the list. The publish destination can be either a database or a Hadoop Distributed File System.
  3. Select a publish method. Specifies the method to use when publishing the scoring function or model files to the database.
  4. Select one or more models to publish from the models list.
  5. To specify a new publish name, click in the Publish Name column for a selected model. Specifies the name to use when publishing a scoring function or model files to the database. The publish name is a user-defined value that can be modified. The SAS Embedded Process publish method uses the Publish Name as the model name to publish the model files to the database. The scoring function publish method has a system-generated Prefix and the Publish Name that makes up the scoring function name. These are used to publish the model scoring function. The prefix portion of the scoring function name is 11 characters long and is in the format of Yyymmddnnn_:
    • Y is a literal character and is fixed for all prefixes.
    • yy is the two-digit year.
    • mm is the month and ranges from 01 to 12.
    • dd is the day and ranges from 01 to 31.
    • nnn is a counter that increments by 1 each time that a scoring function completes successfully. The value can range from 001 to 999.
    • _ is the underscore that ends the prefix.
    The yymmdd value in the prefix is the GMT timestamp that identifies the date on which you published the model. An example of a function name is Y081107001_user_defined_value. Here are the naming convention requirements:
    Here are the naming convention requirements for the publish name:
    • The user-defined value is case insensitive. The maximum length of alphanumeric characters is determined by the database type and publish method that is selected. No spaces are allowed. An underscore is the only special character that can be included in the publish name.
    • The recommended maximum lengths of the publish name for the scoring function publish method are the following:
      • 19 alphanumeric characters for Teradata
      • 32 alphanumeric characters for Netezza, Greenplum, and DB2
      UNIX Specifics: The publish name (user-defined) portion of the function name in an AIX environment has a maximum length of 16 alphanumeric characters for Teradata.
    • The recommended maximum length of the publish name for the SAS Embedded Process publish method is 32 alphanumeric characters for all database types. The database types that are currently supported by SAS Model Manager are Teradata, Oracle, Greenplum, and DB2.
    The value of the publish name is validated against the target database, when the option Replace scoring files that have the same publish name is not selected for the SAS Embedded Process publish method. If the publish name is not unique, an error message is displayed.
    Note: The default format of the publish name is configured by the SAS administrator.
  6. (Optional) Select whether to Replace scoring files that have the same publish name. Specifies to replace the model scoring files that have the same publish name when you are using the SAS Embedded Process publish method. The value of the publish name is validated against the target database when this option is not selected. If the publish name is not unique, an error message is displayed.
  7. Specify an identifier to add to the database target table for each model. Specifies the value of the identifier that is added to each model in the database so that the Database administrator or other users can query the database. The default value is the project name. This option is available only for the SAS Embedded Process publish method.
  8. (Optional) Select whether to Validate scoring results. Specifies to validate the scoring results when publishing a model scoring function or model scoring files. This option creates a benchmark scoring result on the SAS Workspace Server using the DATA Step score code. The scoring input data set is used to create an equivalent database table. Scoring is performed using the new scoring function or model scoring files and database table. The scoring results are then compared. If selected, click Browse to navigate to the appropriate training table. The default train table that is specified in the properties of the published model is used by default.
    Note: When using the SAS Embedded Process publish method to publish models from SAS Model Manager to a Teradata database, your user database must have the appropriate user permissions to validate the scoring results. For more information, see Teradata Permissions for Publishing Formats and Scoring Models in SAS In-Database Products: Administrator’s Guide.
  9. Specify the database settings.
    Note: For Hadoop and Teradata, when you are using the SAS Embedded Process publish method, the User ID and Password settings are hidden if your system is configured for Kerberos authentication. For Netezza, when you are publishing using the Scoring function publish method, the name of the database is case sensitive.
    Here are the available database settings according to the publish method and database type:
    Database Settings
    SAS Embedded Process
    Scoring Function
    Server
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • Hadoop
    • DB2
    • SAP HANA
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Database
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Instance number
    SAP HANA
    Not applicable
    User ID
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Hadoop
    • SAP HANA
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Password
    • Teradata
    • Oracle
    • Netezza
    • Greenplum
    • DB2
    • Hadoop
    • SAP HANA
    • Teradata
    • Netezza
    • Greenplum
    • DB2
    Server user ID
    Not applicable
    DB2
    Compile database
    Not applicable
    Netezza
    Jazlib database
    Not applicable
    Netezza
    Schema
    • Oracle
    • Greenplum
    • DB2
    • SAP HANA
    • Greenplum
    • DB2
    Initial wait time (in seconds)
    Not applicable
    DB2
    FTP time out (in seconds)
    Not applicable
    DB2
    Directory path
    Hadoop
    Not applicable
    For a description of each database setting, see Descriptions of Database Settings.
  10. Click More Options to specify other options for the database.
    Keep scoring function if validation fails (scoring function) or Keep scoring files if validation fails (SAS Embedded Process)
    specifies to save the scoring function or model scoring files if the validation of the scoring results fails. Saving the scoring function or model scoring files is useful for debugging if validation fails.
    Sample size
    specifies the size of the sample to use for validating the scoring function or model files. The default value is 100. The maximum number of digits that are allowed is 8.
    Display detailed log messages
    provides detailed information, which includes warnings and error messages that occur when you publish a scoring function or scoring model files.
    Use model input
    specifies to use the selected model input when publishing the scoring function or model files instead of using the project input, which is the default. This is useful when the project input variables exceed the limitations for a database.
    Here are the limitations for the number of model input variables when publishing a champion model or challenger model to a database:
    Database Type
    SAS Embedded Process
    Scoring Function
    DB2
    The maximum depends on the page size of the database tablespace. For a 4K page size database, the limit is 500. If you have it configured for any of the larger page sizes (8K, 16K, 32K), then the limit is 1012.
    90
    Greenplum
    1660
    100
    Hadoop
    No limit
    Not applicable
    Netezza
    1600
    64
    Oracle
    1000
    Not applicable
    SAP HANA
    1000
    Not applicable
    Teradata
    If you use Teradata version 13.1 or 14.0, the maximum is 1024. If you use the SAS Embedded Process and Teradata version 14.10, the maximum is 2048.
    128
    Protected mode (Teradata only)
    specifies the mode of operation to use when publishing a model using the scoring function publish method. There are two modes of operation, protected and unprotected. You specify the mode by selecting or deselecting the Protected mode option. The default mode of operation is protected. Protected mode means that the macro code is isolated in a separate process from the Teradata database, and an error does not cause database processing to fail. You should run the Publish Scoring Function in protected mode during validation. When the model is ready for production, you can run the Publish Scoring Function in unprotected mode. You might see a significant performance advantage when you run the Publish Scoring Function in unprotected mode.
    Fenced mode (DB2 and Netezza only)
    specifies the mode of operation to use when publishing a model using the scoring function publish method. There are two modes of operation, fenced and unfenced. You specify the mode by selecting or deselecting the Fenced mode option. The default mode of operation is fenced. Fenced mode means that the macro code is isolated in a separate process from the DB2 database, and an error does not cause database processing to fail. You should run the Publish Scoring Function in fenced mode during validation. When the model is ready for production, you can run the Publish Scoring Function in unfenced mode. You might see a significant performance advantage when you run the Publish Scoring Function in unfenced mode.
  11. Click Publish.

Descriptions of Database Settings

The following are descriptions of the database settings that are used for publishing models.
Server
specifies the name of the server where the database resides. The port can be included as part of the server name (for example, server-name:port). When you are using the scoring function publish method, the default port is used if you do not specify the port as part of the server name.
Note: The port must be included as part of the server name for SAP HANA SPS09 or higher. For SAP HANA SPS08, you must use the instance number setting instead of adding the port to the server name.
Database
specifies the name of the database.
Note: For Netezza, the name of the database is case sensitive if you are publishing models using the Scoring function publish method.
User ID
specifies the user identification that is required to access the database.
Note: For Hadoop or Teradata, this setting is hidden if your system is configured for Kerberos authentication.
Password
specifies the password that is associated with the User ID.
Note: For Hadoop or Teradata, this setting is hidden if your system is configured for Kerberos authentication.
Server user ID (DB2 only)
specifies the user ID for SAS SFTP. This value enables you to access the machine on which you have installed the DB2 database. If you do not specify a value for Server user ID, the value of User ID is used as the user ID for SAS SFTP.
Schema (Greenplum, Oracle, and DB2)
specifies the schema name for the database. The schema name is owned by the user that is specified in the User ID field. The schema must be created by your database administrator.
Initial wait time (DB2 only)
specifies the initial wait time in seconds for SAS SFTP to parse the responses and complete the SFTP –batch file process.
Default: 15 seconds
FTP time out (DB2 only)
specifies the time-out value in seconds if SAS SFTP fails to transfer the files.
Default: 120 seconds
Compile database (Netezza only)
specifies the name of the database where the SAS_COMPILEUDF function is published.
Default: SASLIB
See Also: For more information about publishing the SAS_COMPILEUDF function, see the SAS In-Database Products: Administrator’s Guide.
Jazlib database (Netezza only)
specifies the name of the database where the SAS 9.3 Formats Library for Netezza is published.
Default: SASLIB
Instance number (SAP HANA only)
specifies the instance number.
Note: The instance number must be used for SAP HANA SPS08, instead of adding the port to the server name. For SAP HANA SPS09 or higher the port must be included as part of the server name and the instance number is ignored.
Directory path (Hadoop only)
specifies the directory path for the server.
Last updated: June 12, 2017