Publishing Scoring Model Files in Greenplum

The SAS publishing macros are used to publish the formats and the scoring functions in Greenplum.
The %INDGP_PUBLISH_MODEL macro creates the files that are needed to build the scoring functions and publishes the scoring functions with those files to a specified database in Greenplum. Only the EM_ output variables are published as Greenplum scoring functions. For more information about the EM_ output variables, see Fixed Variable Names.
The %INDGP_PUBLISH_MODEL macro uses some of the files that are created by the SAS Enterprise Miner Score Code Export node: the scoring model program (score.sas file), the properties file (score.xml file), and (if the training data includes SAS user-defined formats) a format catalog.
The %INDGP_PUBLISH_MODEL macro performs the following tasks:
  • takes the score.sas and score.xml files and produces the set of .c and .h files. These .c and .h files are necessary to build separate scoring functions for each of a fixed set of quantities that can be computed by the scoring model code.
  • if a format catalog is available, processes the format catalog and creates an .h file with C structures, which are also necessary to build the scoring functions.
  • produces a script of the Greenplum commands that are used to register the scoring functions in the Greenplum database.
  • transfers the .c and .h files to Greenplum.
  • calls the SAS_COMPILEUDF function to compile the source files into object files and links to the SAS formats library.
  • calls the SAS_COPYUDF function to copy the new object files to full-path-to-pkglibdir/SAS on the whole database array (master and all segments) , where full-path-to-pkglibdir is the path that was defined during installation.
  • uses the SAS/ACCESS Interface to Greenplum to run the script to create the scoring functions with the object files.
The scoring functions are registered in Greenplum with shared object files, which are loaded at run time. These functions are stored in a permanent location. The SAS object files and the SAS formats library are stored in the full-path-to-pkglibdir/SAS directory on all nodes, where full-path-to-pkglibdir is the path that was defined during installation.
Greenplum caches the object files within a session.
Note: You can publish scoring model files with the same model name in multiple databases and schemas. Because all model object files for the SAS scoring function are stored in the full-path-to-pkglibdir/SAS directory, the publishing macros use the database, schema, and model name as the object filename to avoid potential naming conflicts.