In-Database Deployment Package for Greenplum

Prerequisites

SAS Foundation and the SAS/ACCESS Interface to Greenplum must be installed before you install and configure the in-database deployment package for Greenplum.

Overview of the In-Database Deployment Package for Greenplum

This section describes how to install and configure the in-database deployment package for Greenplum (SAS Formats Library for Greenplum 2.2).
The in-database deployment package for Greenplum must be installed and configured before you can perform the following tasks:
  • Use the %INDGP_PUBLISH_FORMATS format publishing macro to create or publish the SAS_PUT( ) function and to create or publish user-defined formats as format functions inside the database.
  • Use the %INDGP_PUBLISH_MODEL scoring publishing macro to create scoring model functions inside the database.
The format and scoring publishing macros are included in the SAS/ACCESS Interface to Greenplum. For more information about using the format and scoring publishing macros, see the SAS In-Database Products: User's Guide.
The in-database deployment package for Greenplum contains the SAS formats library and precompiled binary files for the publishing macros.
The SAS formats library is a run-time library that is installed on your Greenplum system. This installation is done so that the SAS scoring model functions and the SAS_PUT( ) function created in Greenplum can access the routines within the run-time library.
The %INDGP_PUBLISH_COMPILEUDF macro registers utility functions in the database. The utility functions are called by the format and scoring publishing macros, %INDGP_PUBLISH_MODEL. You must run this macro before you run the format and scoring publishing macros.

Function Publishing Process in Greenplum

To publish the scoring model functions and the SAS_PUT( ) function to a Greenplum database, the publishing macros perform the following tasks:
  • Create and transfer the source files to the Greenplum server.
    The files are transferred through database tables. Before transfer, each source file is divided into 32K blocks and converted to hexadecimal values to avoid problems with special characters, such as line feed or quotation marks. After the files are exported to a temporary directory on the database server, the source files are converted back to text.
  • Compile those source files into object files using the appropriate compiler for the Greenplum system.
  • Link with the SAS formats library.
  • Copy the shared object files to full-path-to-pkglibdir/SAS. The object files are loaded when the scoring model functions are called.
  • Register the format and scoring model functions in Greenplum with those object files. If an existing format or scoring model function is replaced, the publishing macros replace the obsolete object file upon successful compilation and publication of the new format or scoring model function.

Greenplum Installation and Configuration Steps

  1. If you are upgrading from or reinstalling a previous release, follow the instructions in Upgrading from or Reinstalling a Previous Version before installing the in-database deployment package.
  2. Move and unpack the SAS formats library and binary files for the publishing macro.
  3. Run the %INDGP_PUBLISH_COMPILEUDF macro.

Upgrading from or Reinstalling a Previous Version

If you are upgrading from or reinstalling a previous version, follow the instructions in "Greenplum Installation and Configuration Steps." However, you need to run the %INDGP_PUBLISH_COMPILEUDF macro twice.
The CopySASFiles.sh install script replaces existing versions of most files. However, you need to replace the existing SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions after you run the CopySASFiles.sh install script. To do this, run the %INDGP_PUBLISH_COMPILEUDF macro with ACTION=DROP. Then rerun the %INDGP_PUBLISH_COMPILEUDF macro with ACTION=CREATE. For more information, see Running the %INDGP_PUBLISH_COMPILEUDF Macro.

Moving and Unpacking the SAS Formats Library and Binary Files

The SAS formats library and the binary files for the publishing macros are contained in a self-extracting archive file. The self-extracting archive file is located in the SAS-install-directory/SASFormatsLibraryforGreenplum/2.2/GreenplumonLinux64/ directory.
To move and unpack the self-extracting archive file, follow these steps:
  1. Using a method of your choice, transfer the accelgplmfmt-2.2-n_lax.sh file to your Greenplum master node. n is a number that indicates the latest version of the file. If this is the initial installation, nhas a value of 1. Each time you reinstall or upgrade, n is incremented by 1.
    The file does not have to be downloaded to a specific location. However, you need to note where it is downloaded so that it can be executed at a later time.
  2. After the accelgplmfmt-2.2-n_lax.sh has been transferred, log in to the Greenplum master node.
  3. Move to the directory where the self-extracting archive file was downloaded.
  4. Use the following command at the UNIX prompt to unpack the self-extracting archive file.
    ./accelgplmfmt-2.2-1_lax.sh
    Note: If you receive a “permissions denied” message, check the permissions on the accelgplmfmt-2.2-n_lax.sh file. This file must have EXECUTE permissions to run.
    After the script runs and the files are unpacked, the content of the target directories should look similar to these where path_to_tar_file is the location to which you copied the self-extracting archive file.
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/2.2-1/bin/
        InstallAccelGplmFmt.sh
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/2.2-1/bin/
        CopySASFiles.sh 
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/2.2-1/lib/
        SAS_CompileUDF.so
    /path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/2.2-1/lib/
        libjazxfbrs.so 
  5. Use the following command to place the files in Greenplum:
    ./path_to_tar_file/SAS/SASFormatsLibraryForGreenplum/2.2-1/bin/
        CopySASFiles.sh
    All the SAS object files are stored under full-path-to-pkglibdir/SAS. The files are copied to the master node and each of the segment nodes. This command replaces all previous versions of the libjazxfbrs.so file.
    Note: You can use the following command to determine the full-path-to-pkglibdir directory:
    $ pg_config --pkglibdir
    The pg_config --pkglibdir command must be run by the person who performed the Greenplum install.
    Note: If you add new nodes at a later date, you must copy all the binary files to the new nodes. For more information, see Step 6.
  6. (Optional) If you add new nodes to the Greenplum master node after the initial installation of the SAS formats library and publishing macro, you must copy all the binaries in the full-path-to-pkglibdir/SAS directory, including SAS_CompileUDF.so, libjazxfbrs.so, and the binary files for the already published functions, to the new nodes using a method of your choice.
    In addition, you must follow these steps from the master node to create the symbolic links to the SAS formats library for Greenplum (libjazfbrs.so).
    The symbolic links are created where the library was loaded on each node in the database array including the master and all segments.
    1. Use the following command to determine the full path to where the library was loaded.
      $ pg_config --libdir
      This is the path where the symbolic link is created.
    2. Use the following command to determine the SAS In-Database shared library deployment path.
      $ pg_config --pkglibdir
      This is the path that is linked to and where the SAS formats library is deployed.
    3. Use the following command to create the symbolic link on the master node.
      $ ln -s path-from-pg_config --pkglibdir/SAS/libjazxfbrs.so 
         path-from-pg_config --libdir/libjazxfbrs.so
      Use the value from Step 6b for path-from-pg_config --pkglibdir. Use the value from Step 6a for path-from-pg_config --libdir.
    4. Use the following commands to connect to each of the segment nodes and create the symbolic links on each of the nodes.
      /* Use this command from the master node to connect to each segment node */
      $ ssh <segment nodename>
      /* Use this commmand on each segment node to create the link */
      $ ln -s path-from-pg_config --pkglibdir/SAS/libjazxfbrs.so 
          path-from-pg_config --libdir/libjazxfbrs.so
      
      Use the value from Step 6b for path-from-pg_config --pkglibdir. Use the value from Step 6a for path-from-pg_config --libdir.
      To verify that the link is created correctly, go to the directory that results from running the pg_config --libdir command and list libjazxfbrs.so.

Running the %INDGP_PUBLISH_COMPILEUDF Macro

Overview of the %INDGP_PUBLISH_COMPILEUDF Macro

The %INDGP_PUBLISH_COMPILEUDF macro publishes the following functions to the SASLIB schema in a Greenplum database:
  • SAS_COMPILEUDF function
    This function facilitates the %INDGP_PUBLISH_FORMATS format publishing macro and the %INDGP_PUBLISH_MODEL scoring publishing macro. The SAS_COMPILEUDF function performs the following tasks:
    • compiles the format and scoring model source files into object files. This compilation occurs through the SQL interface using an appropriate compiler for the system.
    • links with the SAS formats library.
    • copies the object files to the full-path-to-pkglibdir/SAS directory. All the SAS object files are stored under full-path-to-pkglibdir/SAS. You can use the pg_config --pkglibdir command to determine the full-path-to-pkglibdir directory.
  • Three utility functions that are used when the scoring publishing macro transfers source files from the client to the host:
    • SAS_COPYUDF function
      This function copies the shared libraries to the full-path-to-pkglibdir/SAS path on the whole database array including the master and all segments.
    • SAS_DIRECTORYUDF function
      This function creates and removes a temporary directory that holds the source files on the server.
    • SAS_DEHEXUDF function
      This function converts the files from hexadecimal back to text after the files are exported on the host.
    For more information about the file transfer process, see Function Publishing Process in Greenplum.
You have to run the %INDGP_PUBLISH_COMPILEUDF macro only one time in each database.
The SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions must be published before you run the %INDGP_PUBLISH_FORMATS or the %INDGP_PUBLISH_MODEL macro. Otherwise, these macros fail.
Note: To publish the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions, you must have superuser permissions to create and execute these functions in the SASLIB schema and in the specified database.

%INDGP_PUBLISH_COMPILEUDF Macro Run Process

To run the %INDGP_PUBLISH_COMPILEUDF macro, follow these steps:
Note: To publish the SAS_COMPILEUDF function, you must have superuser permissions to create and execute this function in the SASLIB schema and in the specified database.
  1. Create a SASLIB schema in the database where the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions are published.
    You must use “SASLIB” as the schema name for Greenplum in-database processing to work correctly.
    You specify that database in the DATABASE argument of the %INDGP_PUBLISH_COMPILEUDF macro. For more information, see %INDGP_PUBLISH_COMPILEUDF Macro Syntax.
    The SASLIB schema contains the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions.
  2. Start SAS 9.3 and submit the following commands in the Enhanced Editor or Program Editor:
    %indgppc;
    %let indconn = user=youruserid password=yourpwd dsn=yourdsn;
    /* You can use server=yourserver database=yourdb instead of dsn=yourdsn */
    
    For more information, see %INDGPPC Macro and INDCONN Macro Variable.
  3. Run the %INDGP_PUBLISH_COMPILEUDF macro. For more information, see %INDGP_PUBLISH_COMPILEUDF Macro Syntax.
You can verify that the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions have been published successfully. For more information, see Validating the Publishing of the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF Functions.

%INDGPPC Macro

The %INDGPPC macro is an autocall library that initializes the %INDGP_PUBLISH_COMPILEUDF macro.

INDCONN Macro Variable

The INDCONN macro variable provides the credentials to make a connection to Greenplum. You must specify the user, password, and either the DSN or server and database information to access the machine on which you have installed the Greenplum database. You must assign the INDCONN macro variable before the %INDGP_PUBLISH_COMPILEUDF macro is invoked.
The value of the INDCONN macro variable for the %INDGP_PUBLISH_COMPILEUDF macro has one of these formats:
USER=<'>userid<'> PASSWORD=<'>password<'> DSN=<'>dsnname
USER=<'>userid<'> PASSWORD=<'>password<'> SERVER=<'>server<'>
DATABASE=<'>database<'>
USER=<'>userid<'>
specifies the Greenplum user name (also called the user ID) that is used to connect to the database. If the user name contains spaces or nonalphanumeric characters, enclose the user name in quotation marks.
PASSWORD=<'>password<'>
specifies the password that is associated with your Greenplum user ID. If the password contains spaces or nonalphabetic characters, enclose the password in quotation marks.
Tip You can use only PASSWORD=, PASS=, or PW= for the password argument. PWD= is not supported and causes an error.
DSN=<'>datasource<'>
specifies the configured Greenplum ODBC data source to which you want to connect. If the DSN name contains spaces or nonalphabetic characters, enclose the DSN name in quotation marks.
Requirement: You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.
SERVER=<'>server<'>
specifies the Greenplum server name or the IP address of the server host. If the server name contains spaces or nonalphanumeric characters, enclose the server name in quotation marks.
Requirement: You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable.
DATABASE=<'>database<'>
specifies the Greenplum database that contains the tables and views that you want to access. If the database name contains spaces or nonalphanumeric characters, enclose the database name in quotation marks.
Requirement: You must specify either the DSN= argument or the SERVER= and DATABASE= arguments in the INDCONN macro variable
Note: The default port that is specified by Greenplum is 5432.
Note: The SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions are published to the SASLIB schema in the specified database. The SASLIB schema must be created before publishing the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions.

%INDGP_PUBLISH_COMPILEUDF Macro Syntax

%INDGP_PUBLISH_COMPILEUDF
(OBJPATH=full-path-to-pkglibdir/SAS
<, DATABASE=database-name>
<, ACTION=CREATE | REPLACE | DROP>
<, OUTDIR=diagnostic-output-directory>
);
Arguments
OBJPATH=full-path-to-pkglibdir/SAS
specifies the parent directory where all the object files are stored.
Tip The full-path-to-pkglibdir directory was created during installation of the self-extracting archive file. You can use the pg_config --pkglibdir command to determine the name of the full-path-to-pkglibdir directory.
DATABASE=database-name
specifies the name of a Greenplum database to which the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions are published.
Restriction If you specify DSN= in the INDCONN macro variable, do not use the DATABASE argument.
ACTION=CREATE | REPLACE | DROP
specifies that the macro performs one of the following actions:
CREATE
creates a new SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF function.
REPLACE
overwrites the current SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions, if a function by the same name is already registered, or creates a new SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF function if one is not registered.
DROP
causes the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions to be dropped from the Greenplum database.
Default CREATE
Tip If the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions were published previously and you specify ACTION=CREATE, you receive warning messages that the functions already exist and you are prompted to use REPLACE. If the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions were published previously and you specify ACTION=REPLACE, no warnings are issued.
OUTDIR=output-directory
specifies a directory that contains diagnostic files.
Tip Files that are produced include an event log that contains detailed information about the success or failure of the publishing process.

Validating the Publishing of the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF Functions

To validate that the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, and SAS_DEHEXUDF functions are registered properly under the SASLIB schema in the specified database, follow these steps.
  1. Use psql to connect to the database.
    psql -d databasename
    You should receive the following prompt.
    databasename=#
  2. At the prompt, enter the following command.
    select prosrc from pg_proc f, pg_namespace s where f.pronamespace=s.oid 
        and upper(s.nspname)='SASLIB';
    You should receive a result similar to the following:
    SAS_CompileUDF
    SAS_CopyUDF
    SAS_DirectoryUDF
    SAS_DehexUDF

Greenplum Permissions

To publish the SAS_COMPILEUDF, SAS_COPYUDF, SAS_DIRECTORYUDF, SAS_DEHEXUDF, and the format and scoring model functions, Greenplum requires that you have superuser permissions to create and execute these functions in the SASLIB schema and in the specified database.
If you plan to use SAS Model Manager with the SAS Scoring Accelerator for in-database scoring, additional permissions are required. For more information, see Configurations for SAS Model Manager.

Documentation for Publishing SAS Formats and Scoring Models in Greenplum

For information about how to publish SAS formats and scoring models, see the SAS In-Database Products: User's Guide, located at http://support.sas.com/documentation/onlinedoc/indbtech/index.html.