SAS Accelerator Publishing Agent for Greenplum

Prerequisites

SAS Foundation and the SAS/ACCESS Interface to Greenplum must be installed before you install and configure the SAS Accelerator Publishing Agent for Greenplum.

Overview of the SAS Accelerator Publishing Agent for Greenplum

This section describes how to install and configure the SAS Accelerator Publishing Agent for Greenplum. The SAS 9.2 Formats Library for Greenplum is included with the SAS Accelerator Publishing Agent for Greenplum. This section also describes how to install the SAS 9.2 Formats Library.
The SAS Accelerator Publishing Agent for Greenplum enables you to use a scoring publishing macro (%INDGP_PUBLISH_MODEL) to create scoring model functions inside the database.
The SAS 9.2 Formats Library for Greenplum is a run-time library that is installed on your Greenplum system so that the SAS scoring model functions created in Greenplum can access the routines within its run-time library.
The SAS Accelerator Publishing Agent for Greenplum contains two publishing macros:
  • %INDGP_PUBLISH_MODEL, which enables you to create scoring model functions.
  • %INDGP_PUBLISH_COMPILEUDF, which registers utility functions in the database. The utility functions are called by the scoring publishing macro, %INDGP_PUBLISH_MODEL. You must run this macro before you run the scoring publishing macro.

Function Publishing Process in Greenplum

To publish the SAS scoring model functions to a Greenplum database, the publishing macros perform the following tasks:
  • Create and transfer the source files to the Greenplum server.
    The files are transferred through database tables. Before transfer, each source file is divided into 32K blocks and converted to hexadecimal values to avoid problems with special characters, such as line feed or quotation marks. After the files are exported to a temporary directory on the database server, the source files are converted back to text.
  • Compile those source files into object files using the appropriate compiler for the Greenplum system.
  • Link with the SAS 9.2 Formats Library.
  • Copy the shared object files to full-path-to-pkglibdir/SAS. The object files are loaded when the scoring model functions are called.
  • Register the scoring model functions in Greenplum with those object files. If an existing scoring model function is replaced, the publishing macros replace the obsolete object file upon successful compilation and publication of the new scoring model function.

Greenplum Installation and Configuration Steps

  1. Apply Hot Fix B85002 for SAS Table Server Base Components 9.2_M2. This hot fix can be found at http://ftp.sas.com/techsup/dowload/hotfix/HF2/B85.html#B85002.
  2. Move and unpack the SAS 9.2 Formats Library and binary files for the publishing macros.
  3. Run the %INDGP_PUBLISH_COMPILEUDF macro.

Moving and Unpacking the SAS 9.2 Formats Library and Binary Files

The SAS 9.2 Formats Library and the binary files for the publishing macros are contained in a self-extracting TAR file on the SAS Software Depot. The location of the TAR file, the README file, and possibly other files, is contained in the ordersummary.html file that accompanies the software. Your order e-mail contains the path to the ordersummary.html file.
To move and unpack the TAR file, follow these steps:
  1. Using a method of your choice, transfer the accelgplmfmt.sh file to your Greenplum master node.
    The file does not have to be downloaded to a specific location. However, you need to note where it is downloaded so that it can be executed at a later time.
    The accelgplmfmt.sh file is located in the install-depot-dir\products\accelgplmfmt_92110_prt_xx_sp0_1 directory.
  2. After the accelgplmfmt.sh has been transferred, log in to the Greenplum master node.
  3. Move to the directory where the TAR file was downloaded.
  4. Use the following command at the UNIX prompt to unpack the TAR file.
    ./accelgplmfmt.sh
    Note: If you receive a “permissions denied” message, check the permissions on the accelgplmfmt.sh file. This file must have EXECUTE permissions to run.
    After the script runs and the files are unpacked, the content of the target directories should look similar to these where path_to_tar_file is the location to which you copied the TAR file.
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/9.2-1.8/bin/
        InstallAccelGplmFmt.sh
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/9.2-1.8/bin/
        CopySASFiles.sh 
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/9.2-1.8/lib/
        SAS_CompileUDF.so
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/9.2-1.8/lib/
        libjazxfbrs.so 
  5. Use the following command to place the files in Greenplum:
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/9.2-1.8/bin/
        CopySASFiles.sh
    All the SAS object files are stored under full-path-to-pkglibdir/SAS. The files are copied to the master node and each of the segment nodes.
    Note: You can use the following command to determine the full-path-to-pkglibdir directory:
    $ pg_config --pkglibdir
    The pg_config --pkglibdir command must be run by the person who performed the Greenplum install.
    Note: If you add new nodes at a later date, you must copy all the binary files to the new nodes. For more information, see Step 7.
  6. From the master node, follow these steps to create symbolic links to the SAS 9.2 Formats Library for Greenplum. The symbolic links are created where the library was loaded on each node in the database array including the master and all segments.
    1. Use the following command to determine the full path to where the library was loaded.
      $ pg_config --libdir
      This is the path where the symbolic link is created.
    2. Use the following command to determine the SAS In-Database shared library deployment path.
      $ pg_config --pkglibdir
      This is the path that is linked to and where the SAS 9.2 Formats Library for Greenplum is deployed.
    3. Use the following command to create the symbolic link on the master node.
      $ ln -s path-from-pg_config --pkglibdir/SAS/libjazxfbrs.so 
         path-from-pg_config --libdir/libjazxfbrs.so
      Use the value from Step 6b for path-from-pg_config --pkglibdir. Use the value from Step 6a for path-from-pg_config --libdir.
    4. Use the following commands to connect to each of the segment nodes and create the symbolic links on each of the nodes.
      /* Use this command from the master node to connect to each segment node */
      $ ssh <segment nodename>
      /* Use this commmand on each segment node to create the link */
      $ ln -s path-from-pg_config --pkglibdir>/SAS/libjazxfbrs.so 
          path-from-pg_config --libdir/libjazxfbrs.so
      
      To verify that the link is created correctly, go to the directory that results from running the pg_config --libdir command and list libjazxfbrs.so.
  7. (Optional) If you add new nodes to the Greenplum master node after the initial installation of the SAS Accelerator Publishing Agent for Greenplum, you must copy all the binaries in the full-path-to-pkglibdir/SAS directory, including SAS_CompileUDF.so, libjazxfbrs.so, and the binary files for the already published functions, to the new nodes using a method of your choice. In addition, you must repeat Step 6 to create the symbolic links to the SAS 9.2 Formats Library for Greenplum (libjazfbrs.so).

Running the %INDGP_PUBLISH_COMPILEUDF Macro

Overview of the %INDGP_PUBLISH_COMPILEUDF Macro

The %INDGP_PUBLISH_COMPILEUDF macro publishes the following functions to the SASLIB schema in a Greenplum database:
  • SAS_COMPILEUDF
    The SAS_COMPILEUDF function facilitates the %INDGP_PUBLISH_MODEL scoring publishing macro. The SAS_COMPILEUDF function performs the following tasks:
    • compiles the scoring model source files into object files. This compilation occurs through the SQL interface using an appropriate compiler for the system.
    • copies the object files to the full-path-to-pkglibdir/SAS directory. All the SAS object files are stored under full-path-to-pkglibdir/SAS. You can use the pg_config --pkglibdir command to determine the full-path-to-pkglibdir directory.
    • creates a link to the SAS 9.2 Formats Library for Greenplum, which is needed for scoring model publishing.
  • Three utility functions that are used when the scoring publishing macro transfers source files from the client to the host:
    • SAS_COPYUDF function
      The SAS_COPYUDF function copies the shared libraries to the full-path-to-pkglibdir/SAS path on the whole database array including the master and all segments. The SAS_COPYUDF function also registers the object files in Greenplum.
    • SAS_DIRECTORYUDF function
      The SAS_DIRECTORYUDF function creates and removes a temporary directory that holds the source files on the server.
    • SAS_DEHEXUDF function
      The SAS_DEHEXUDF function converts the files from hexadecimal back to text after the files are on the host.
    For more information about the file transfer process, see Function Publishing Process in Greenplum.
You have to run the %INDGP_PUBLISH_COMPILEUDF macro only one time in each database.
The SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions must be published before you run the %INDGP_PUBLISH_MODEL macro. Otherwise, the scoring model publishing fails.
Note: To publish the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions, you must have superuser permissions to create and execute these functions in the SASLIB schema and in the specified database.

%INDGP_PUBLISH_COMPILEUDF Macro Run Process

To run the %INDGP_PUBLISH_COMPILEUDF macro, follow these steps:
Note: To publish the SAS_COMPILEUDF function, you must have superuser permissions to create and execute this function in the SASLIB schema and in the specified database.
  1. Create a SASLIB schema in the database where the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions are published.
    You must use “SASLIB” as the schema name for Greenplum in-database processing to work correctly.
    You specify that database in the DATABASE argument of the %INDGP_PUBLISH_COMPILEUDF macro. For more information, see %INDGP_PUBLISH_COMPILEUDF Macro Syntax.
    The SASLIB schema will contain the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions.
  2. Start SAS 9.2 and submit the following commands in the Enhanced Editor or Program Editor:
    %indgppc;
    %let indconn = user=youruserid password=yourpwd dsn=yourdsn;
    /* You can use server=yourserver database=yourdb instead of dsn=yourdsn */
    %indgp_publish_compileudf(
       objpath=full-path-to-pkglibdir/SAS,
       database=yourdatabase,
       action=create,
       outdir=youroutputdirectory);
    
For more information about the arguments for the %INDGP_PUBLISH_COMPILEUDF macro, see %INDGP_PUBLISH_COMPILEUDF Macro Syntax.
You can verify that the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions have been published successfully. For more information, see Validating the Publishing of the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF Functions.

%INDGPPC Macro

The %INDGPPC macro is an autocall library that initializes the %INDGP_PUBLISH_COMPILEUDF macro.

INDCONN Macro Variable

The INDCONN macro variable provides the credentials to make a connection to Greenplum. You must specify the user, password, and either the DSN or server and database information to access the machine on which you have installed the Greenplum database. You must assign the INDCONN macro variable before the %INDGP_PUBLISH_COMPILEUDF macro is invoked.
The value of the INDCONN macro variable for the %INDGP_PUBLISH_COMPILEUDF macro has one of these formats:
USER=<'>userid<'> PASSWORD=<'>password<'> DSN=<'>dsnname
USER=<'>userid<'> PASSWORD=<'>password<'> SERVER=<'>server<'>
DATABASE=<'>database<'>
USER=<'>userid<'>
specifies the Greenplum user name (also called the user ID) that is used to connect to the database. If the user name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.
PASSWORD=<'>password<'>
specifies the password that is associated with your Greenplum user ID. If the password contains spaces or nonalphabetic characters, you must enclose it in quotation marks.
Tip:You can use only PASSWORD= or PW= for the password argument. Other aliases such as PASS= or PWD= are not supported and cause an error.
DSN=<'>datasource<'>
specifies the configured Greenplum ODBC data source to which you want to connect. If the DSN name contains spaces or nonalphabetic characters, you must enclose it in quotation marks.
Requirement: You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.
SERVER=<'>server<'>
specifies the Greenplum server name or the IP address of the server host. If the server name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.
Requirement: You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.
DATABASE=<'>database<'>
specifies the Greenplum database that contains the tables and views that you want to access. If the database name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.
Requirement: You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable
Note: The default port that is specified by Greenplum is 5432.
Note: The SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions are published to the SASLIB schema in the specified database. The SASLIB schema must be created before publishing the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions.

%INDGP_PUBLISH_COMPILEUDF Macro Syntax

%INDGP_PUBLISH_COMPILEUDF(
OBJPATH=full-path-to-pkglibdir/SAS
<, DATABASE=database-name>
<, ACTION=CREATE | REPLACE | DROP>
<, OUTDIR=diagnostic-output-directory>
);
Arguments
OBJPATH=full-path-to-pkglibdir/SAS
specifies the parent directory where all the object files are stored.
Tip:The full-path-to-pkglibdir directory was created during installation of the TAR file. You can use the pg_config --pkglibdir command to determine the name of the full-path-to-pkglibdir directory.
DATABASE=database-name
specifies the name of a Greenplum database to which the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions are published.
ACTION=CREATE | REPLACE | DROP
specifies that the macro performs one of the following actions:
CREATE
creates a new SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF function.
REPLACE
overwrites the current SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions, if a function by the same name is already registered, or creates a new SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF function if one is not registered.
DROP
causes the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions to be dropped from the Greenplum database.
Default:CREATE
Tip:If the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions were published previously and you now specify ACTION=CREATE, you receive warning messages from Greenplum that the functions already exist and you are prompted to use REPLACE. If the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions were published previously and you specify ACTION=REPLACE, no warnings are issued.
OUTDIR=output-directory
specifies a directory that contains diagnostic files.
Tip:Files that are produced include an event log that contains detailed information about the success or failure of the publishing process.

Validating the Publishing of the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF Functions

To validate that the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions are registered properly under the SASLIB schema in the specified database, follow these steps.
  1. Use psql to connect to the database.
    psql -d databasename
    You should receive the following prompt.
    databasename=#
  2. At the prompt, enter the following command.
    select prosrc from pg_proc f, pg_namespace s where f.pronamespace=s.oid 
        and upper(s.nspname)='SASLIB';
    You should receive a result similar to the following:
    SAS_CompileUDF
    SAS_CopyUDF
    SAS_DirectoryUDF
    SAS_DehexUDF

Greenplum Permissions for the SAS Accelerator Publishing Agent

To publish the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, SAS_DEHEXUDF, and scoring model functions, Greenplum requires that you have superuser permissions to create and execute these functions in the SASLIB schema and in the specified database.