Running the %INDGP_PUBLISH_MODEL Macro

%INDGP_PUBLISH_MODEL Macro Run Process

To run the %INDGP_PUBLISH_MODEL macro, complete the following steps:
  1. Create a scoring model using SAS Enterprise Miner.
  2. Use the SAS Enterprise Miner Score Code Export node to create a score output directory and populate the directory with the score.sas file, the score.xml file, and, if needed, the format catalog.
  3. Start SAS 9.3 and submit one of the following sets of commands in the Program Editor or Enhanced Editor:
    %indgppm;
    %let indconn = user=youruserid password=yourpwd
        dsn=yourdsn;
    %indgppm;
    %let indconn = user=youruserid password=yourpwd server=yourserver
        database=your db schema=yourschema;
    
    For more information, see %INDGPPM Macro and INDCONN Macro Variable.
  4. Run the %INDGP_PUBLISH_MODEL macro. For more information, see %INDGP_PUBLISH_MODEL Macro Syntax.
    Messages are written to the SAS log that indicate the success or failure of the creation of the scoring functions.

%INDGPPM Macro

The %INDGPPM macro searches the autocall library for the indgppm.sas file. The indgppm.sas file contains all the macro definitions that are used in conjunction with the %INDGP_PUBLISH_MODEL macro. The indgppm.sas file should be in one of the directories listed in the SASAUTOS= system option in your configuration file. If the indgppm.sas file is not present, the %INDGPPM macro call (%INDGPPM; statement) issues the following message:
macro indgppm not defined

INDCONN Macro Variable

The INDCONN macro variable is used to provide credentials to connect to Greenplum. You must specify user, password, either the DSN or the server and database names, and schema. You must assign the INDCONN macro variable before the %INDGP_PUBLISH_MODEL macro is invoked.
The value of the INDCONN macro variable for the %INDGP_PUBLISH_MODEL macro has one of these formats:
USER=username PASSWORD=password DSN=dsnname
USER=username PASSWORD=password SERVER=servername
DATABASE=databasename SCHEMA=schemaname
Arguments
USER=<'>username<'>
specifies the Greenplum user name (also called the user ID) that is used to connect to the database. If the user name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.
PASSWORD=<'>password<'>
specifies the password that is associated with your Greenplum user ID. If the password contains spaces or nonalphabetic characters, you must enclose it in quotation marks.
Tip:Use only PASSWORD=, PASS=, or PW= for the password argument. PWD= is not supported and causes an error.
DSN=<'>datasourcename<'>
specifies the configured Greenplum ODBC data source to which you want to connect. If the DSN contains spaces or nonalphabetic characters, you must enclose it in quotation marks.
Requirement:You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.
SERVER=<'>servername<'>
specifies the Greenplum server name or the IP address of the server host. If the server name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.
Requirement:You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.
DATABASE=<'>databasename<'>
specifies the Greenplum database that contains the tables and views that you want to access. If the database name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.
Requirement:You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.
SCHEMA=<'>schemaname<'>
specifies the schema name for the database.
Tip:If you do not specify a value for the SCHEMA argument, the value of the USER argument is used as the schema name. The schema must be created by your database administrator.
Tip
The INDCONN macro variable is not passed as an argument to the %INDGP_PUBLISH_MODEL macro. This information can be concealed in your SAS job. You might want to place it in an autoexec file and set the permissions on the file so that others cannot access the user ID and password.

%INDGP_PUBLISH_MODEL Macro Syntax

%INDGP_PUBLISH_MODEL
(DIR=input-directory-path, MODELNAME=name
<, DATASTEP=score-program-filename>
<, XML=xml-filename>
<, DATABASE=database-name>
<, FMTCAT=format-catalog-filename>
<, ACTION=CREATE | REPLACE | DROP>
<, OUTDIR=diagnostic-output-directory>
);
Arguments
DIR=input-directory-path
specifies the directory where the scoring model program, the properties file, and the format catalog are located.
This is the directory that is created by the SAS Enterprise Miner Score Code Export node. This directory contains the score.sas file, the score.xml file, and (if user-defined formats were used) the format catalog.
Requirement:You must use a fully qualified pathname.
Interaction:If you do not use the default directory that is created by SAS Enterprise Miner, you must specify the DATASTEP=, XML=, and (if needed) FMTCAT= arguments.
MODELNAME=name
specifies the name that is prepended to each output function to ensure that each scoring function name is unique on the Greenplum database.
Restriction:The scoring function name is a combination of the model and output variable names. A scoring function name cannot exceed 63 characters. For more information, see Scoring Function Names.
Requirement:The model name must be a valid SAS name that is 10 characters or fewer. For more information about valid SAS names, see the topic on rules for words and names in SAS Language Reference: Concepts.
Interaction: Only the EM_ output variables are published as Greenplum scoring functions. For more information about the EM_ output variables, see Fixed Variable Names and Scoring Function Names.
DATASTEP=score-program-filename
specifies the name of the scoring model program file that was created by using the SAS Enterprise Miner Score Code Export node.
Default: score.sas
Restriction: Only DATA step programs that are produced by the SAS Enterprise Miner Score Code Export node can be used.
Interaction:If you use the default score.sas file that is created by the SAS Enterprise Miner Score Code Export node, you do not need to specify the DATASTEP= argument.
XML=xml-filename
specifies the name of the properties XML file that was created by the SAS Enterprise Miner Score Code Export node.
Default:score.xml
Restrictions: Only XML files that are produced by the SAS Enterprise Miner Score Code Export node can be used.

The maximum number of output variables is 128.

Interaction: If you use the default score.xml file that is created by the SAS Enterprise Miner Score Code Export node, you do not need to specify the XML= argument.
DATABASE=database-name
specifies the name of a Greenplum database to which the scoring functions and formats are published.
Restriction:If you specify DSN= in the INDCONN macro variable, do not use the DATABASE argument.
Interaction:The database that is specified by the DATABASE= argument takes precedence over the database that you specify in the INDCONN macro variable. For more information, see %INDGP_PUBLISH_MODEL Macro Run Process.
FMTCAT=format-catalog-filename
specifies the name of the format catalog file that contains all user-defined formats that were created by the FORMAT procedure and that are referenced in the DATA step scoring model program.
Restriction:Only format catalog files that are produced by the SAS Enterprise Miner Score Code Export node can be used.
Interactions: If you use the default format catalog that is created by the SAS Enterprise Miner Score Code Export node, you do not need to specify the FMTCAT= argument.

If you do not use the default catalog name (FORMATS) or the default library (WORK or LIBRARY) when you create user-defined formats, you must use the FMTSEARCH system option to specify the location of the format catalog. For more information, see PROC FORMAT in the Base SAS Procedures Guide.

ACTION=CREATE | REPLACE | DROP
specifies one of the following actions that the macro performs:
CREATE
creates a new function.
REPLACE
overwrites the current function, if a function by the same name is already registered.
DROP
causes all functions for this model to be dropped from the Greenplum database.
Default: CREATE
Tip:If the function has been previously defined and you specify ACTION=CREATE, you receive warning messages from Greenplum. If the function has been previously defined and you specify ACTION=REPLACE, no warnings are issued.
OUTDIR=diagnostic-output-directory
specifies a directory that contains diagnostic files.
Files that are produced include an event log that contains detailed information about the success or failure of the publishing process and sample SQL code (SampleSQL.txt). For more information about the SampleSQL.txt file, see Scoring Function Names.
Tip:This argument is useful when testing your scoring models.

Model Publishing Macro Example

%indgppm;
%let indconn = user=user1 password=open1 dsn=green6 schema=myschema;
%indgp_publish_model( dir=C:\SASIN\baseball1, modelname=baseball1, outdir=C:\test);
The %INDGP_PUBLISH_MODEL macro produces a text file of Greenplum CREATE FUNCTION commands as shown in the following example.
Note: This example file is shown for illustrative purposes. The text file that is created by the %INDGP_PUBLISH_MODEL macro cannot be viewed and is deleted after the macro is complete.
CREATE FUNCTION baseball1_EM_eventprobablility
(
"CR_ATBAT" float,
"CR_BB" float,
"CR_HITS" float,
"CR_HOME" float,
"CR_RBI" float,
"CR_RUNS" float,
"DIVISION" varchar(31),
"LEAGUE" varchar(31),
"NO_ASSTS" float,
"NO_ATBAT" float,
"NO_BB" float,
"NO_ERROR" float,
"NO_HITS" float,
"NO_HOME" float,
"NO_OUTS" float,
"NO_RBI" float,
"NO_RUNS" float,
"YR_MAJOR" float
)
RETURNS varchar(33)
AS '/usr/local/greenplum-db-3.3.4.0/lib/postgresql/SAS/sample_dbitest_homeeq_5.so',
   'homeeq_5_em_classification'
After the scoring functions are installed, they can be invoked in Greenplum using SQL, as illustrated in the following example. Each output value is created as a separate function call in the select list.
select baseball1_EM_eventprobability
( 
"CR_ATBAT",
"CR_BB",
"CR_HITS",
"CR_HOME",
"CR_RBI",
"CR_RUNS",
"DIVISION",
"LEAGUE",
"NO_ASSTS",
"NO_ATBAT",
"NO_BB",
"NO_ERROR",
"NO_HITS",
"NO_HOME",
"NO_OUTS"
) as homeRunProb from MLBGP;