Publishing SAS Formats in Greenplum

Overview of the Publishing Process

The SAS publishing macros are used to publish formats and the SAS_PUT( ) function in Greenplum.
The %INDGP_PUBLISH_FORMATS macro creates the files that are needed to build the SAS_PUT( ) function and publishes those files to the Greenplum database.
This macro also makes many formats that SAS supplies available inside Greenplum. In addition to formats that SAS supplies, you can publish the PROC FORMAT definitions that are contained in a single SAS format catalog by using the FMTCAT= option. The process of publishing a PROC FORMAT catalog entry converts the range label pairs into embedded data in Greenplum.
The %INDGP_PUBLISH_FORMATS macro performs the following tasks:
  • produces the set of .c and .h files that are necessary to build the SAS_PUT( ) function
  • produces a script of the Greenplum commands that are necessary to register the SAS_PUT( ) function in the Greenplum database
  • transfers the .c and .h files to Greenplum
  • calls the SAS_COMPILEUDF function to compile the source files into object files and links to the SAS Formats Library
  • calls the SAS_COPYUDF function to copy the new object files to full-path-to-pkglibdir/SAS on the whole database array (master and all segments) , where full-path-to-pkglibdir is the path that was defined during installation.
  • uses the SAS/ACCESS Interface to Greenplum to run the script to publish the SAS_PUT( ) function to the Greenplum database
The SAS_PUT( ) function is registered in Greenplum with shared object files that are loaded at run time. These functions must be stored in a permanent location. The SAS object files and the SAS Formats Library are stored in the full-path-to-pkglibdir/SAS directory on all nodes, where full-path-to-pkglibdir is the path that was defined during installation.
Greenplum caches the object files within a session.
Note: You can publish format functions with the same name in multiple databases and schemas. Because all format object files are stored in the full-path-to-pkglibdir/SAS directory, the publishing macro uses the database, schema, and model name as the object filename to avoid potential naming conflicts.

Running the %INDGP_PUBLISH_FORMATS Macro

To run the %INDGP_PUBLISH_FORMATS macro, follow these steps:
  1. Start SAS 9.3, and submit one of the following sets of commands in the Program Editor or the Enhanced Editor:
    %indgppf;
    %let indconn = user=youruserid password=yourpwd dsn=yourdsn schema=yourschema; 
    %indgppf;
    %let indconn = user=youruserid password=yourpwd
       database=yourdb server=yourserver schema=yourschema;
    For more information, see %INDGPPF Macro and the INDCONN Macro Variable.
  2. Run the %INDGP_PUBLISH_FORMATS macro.
    For more information, see %INDGP_PUBLISH_FORMATS Macro Syntax.
    Messages are written to the SAS log that indicate whether the SAS_PUT( ) function and format functions were successfully created.

%INDGPPF Macro

The %INDGPPF macro is an autocall library that initializes the format publishing software.

INDCONN Macro Variable

The INDCONN macro variable is used as credentials to connect to Greenplum. You must specify the user, password, and either a DSN name or a server and database name. The schema name is optional. You must assign the INDCONN macro variable before the %INDGD_PUBLISH_FORMATS macro is invoked.
The value of the INDCONN macro variable for the %INDGP_PUBLISH_FORMATS macro has one of these formats:
USER=username PASSWORD=password DSN=dsnname <SCHEMA=schemaname>
USER=username PASSWORD=password SERVER=servername
DATABASE=databasename <SCHEMA=schemaname>
Arguments

USER=username

specifies the Greenplum user name (also called the user ID) that is used to connect to the database.

PASSWORD=password

specifies the password that is associated with your Greenplum user ID.

Tip Use only PASSWORD=, PASS=, or PW= for the password argument. PWD= is not supported and causes an error.

DSN=datasourcename

specifies the configured Greenplum ODBC data source to which you want to connect.

Requirement You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.

SERVER=servername

specifies the Greenplum server name or the IP address of the server host.

Requirement You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.

DATABASE=databasename

specifies the Greenplum database that contains the tables and views that you want to access.

Requirement You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.

SCHEMA=schemaname

specifies the schema name for the database.

Tip If you do not specify a value for the SCHEMA argument, the value of the USER argument is used as the schema name. The schema must be created by your database administrator.

Tip
The INDCONN macro variable is not passed as an argument to the %INDGP_PUBLISH_FORMATS macro. This information can be concealed in your SAS job. You might want to place it in an autoexec file and set the permissions on the file so that others cannot access the user ID and password.

%INDGP_PUBLISH_FORMATS Macro Syntax

%INDGP_PUBLISH_FORMATS
(<DATABASE=database-name>
<, FMTCAT=format-catalog-filename>
<, FMTTABLE=format-table-name>
<, ACTION=CREATE | REPLACE | DROP>
<, OUTDIR=diagnostic-output-directory>
);
Arguments

DATABASE=database-name

specifies the name of a Greenplum database to which the SAS_PUT( ) function and the format functions are published.

Restriction If you specify DSN= in the INDCONN macro variable, do not use the DATABASE argument.
Interaction The database that is specified by the DATABASE= argument takes precedence over the database that you specify in the INDCONN macro variable. For more information, see Running the %INDGP_PUBLISH_FORMATS Macro.

FMTCAT=format-catalog-filename

specifies the name of the format catalog file that contains all user-defined formats that were created by the FORMAT procedure and are made available in Greenplum.

Default If you do not specify a value for FMTCAT= and you have created user-defined formats in your SAS session, the default is WORK.FORMATS.
If you do not specify a value for FMTCAT= and you have not created any user-defined formats in your SAS session, only the formats that SAS supplies are available in Greenplum.
Interactions If the format definitions that you want to publish exist in multiple catalogs, you must copy them into a single catalog for publishing.
If you do not use the default catalog name (FORMATS) or the default library (WORK or LIBRARY) when you create user-defined formats, you must use the FMTSEARCH system option to specify the location of the format catalog. For more information, see PROC FORMAT in the Base SAS Procedures Guide.

FMTTABLE=format-table-name

specifies the name of the Greenplum table that contains all formats that the %INDGP_PUBLISH_FORMATS macro creates and that the SAS_PUT( ) function supports. The format table contains the columns shown in following table.

Format Table Columns
Column Name
Description
FMTNAME
specifies the name of the format.
SOURCE
specifies the origin of the format. SOURCE can contain one of these values:
SAS
supplied by SAS
PROCFMT
User-defined with PROC FORMAT
Default If FMTTABLE is not specified, no table is created. You can see only the SAS_PUT( ) function. You cannot see the formats that are published by the macro.
Interaction If ACTION=CREATE or ACTION=DROP is specified, messages are written to the SAS log that indicate the success or failure of the table creation or drop.

ACTION=CREATE | REPLACE | DROP

specifies that the macro performs one of these actions:

CREATE

creates the SAS_PUT( ) function.

REPLACE

overwrites the current SAS_PUT( ) function, if a SAS_PUT( ) function is already registered.

DROP

causes the SAS_PUT( ) function to be dropped from the Greenplum database.

Default CREATE
Tip If the SAS_PUT( ) function has been previously defined and you specify ACTION=CREATE, you receive warning messages from Greenplum. If the function has been previously defined and you specify ACTION=REPLACE, no warnings are issued.

OUTDIR=diagnostic-output-directory

specifies a directory that contains diagnostic files.

Files that are produced include an event log that contains detailed information about the success or failure of the publishing process.
See Special Characters in Directory Names

Format Publishing Macro Example

%indgppf;
%let indconn = user=user1 password=xxxx dsn=dsnx34 schema=block;
%indgp_publish_formats(fmtcat=work.formats);
This sequence of macros generates a .c and a .h files for each data type. The format data types that are supported are numeric (FLOAT, INT), character, date, time, and timestamp (DATETIME). The %INDGP_PUBLISH_FORMATS macro also produces a text file of Greenplum CREATE FUNCTION commands that are similar to these:
CREATE OR REPLACE FUNCTION dbitest.homeeq_5_em_classification
(
float8,
float8,
float8,
float8,
float8,
varchar(32),
float8,
float8,
varchar(32),
float8,
float8
)
RETURNS varchar(33)
AS '/usr/local/greenplum-db-3.3.4.0/lib/postgresql/SAS/sample_dbitest_homeeq_5.so',
   'homeeq_5_em_classification'
After it is installed, you can use SQL to call the SAS_PUT( ) function in Greenplum. For more information, see Using the SAS_PUT( ) Function in Greenplum.