Previous Page | Next Page

SAS/ACCESS Interface to Netezza

Deploying and Using SAS Formats in Netezza


Using SAS Formats

SAS formats are basically mapping functions that change an element of data from one format to another. For example, some SAS formats change numeric values to various currency formats or date-and-time formats.

SAS supplies many formats. You can also use the SAS FORMAT procedure to define custom formats that replace raw data values with formatted character values. For example, this PROC FORMAT code creates a custom format called $REGION that maps ZIP codes to geographic regions.

proc format;
   value $region
   '02129', '03755', '10005' = 'Northeast'
   '27513', '27511', '27705' = 'Southeast'
   '92173', '97214', '94105' = 'Pacific';
run;

SAS programs frequently use both user-defined formats and formats that SAS supplies. Although they are referenced in numerous ways, using the PUT function in the SQL procedure is of particular interest for SAS In-Database processing.

The PUT function takes a format reference and a data item as input and returns a formatted value. This SQL procedure query uses the PUT function to summarize sales by region from a table of all customers:

select put(zipcode,$region.) as region,
   sum(sales) as sum_sales from sales.customers
   group by region;

The SAS SQL processor knows how to process the PUT function. Currently, SAS/ACCESS Interface to Netezza returns all rows of unformatted data in the SALES.CUSTOMERS table in the Netezza database to the SAS System for processing.

The SAS In-Database technology deploys, or publishes, the PUT function implementation to Netezza as a new function named SAS_PUT(). Similar to any other programming language function, the SAS_PUT() function can take one or more input parameters and return an output value.

The SAS_PUT() function supports use of SAS formats. You can specify the SAS_PUT() function in SQL queries that SAS submits to Netezza in one of two ways:

If you used the SAS_PUT() function in the previous example, Netezza formats the ZIP code values with the $REGION format and processes the GROUP BY clause using the formatted values.

By publishing the PUT function implementation to Netezza as the SAS_PUT() function, you can realize these advantages:

Deploying SAS formats to execute inside a Netezza database can enhance performance and exploit Netezza parallel processing.


How It Works

By using the SAS formats publishing macro, you can generate a SAS_PUT() function that enables you to execute PUT function calls inside the Netezza data warehouse. You can reference the formats that SAS supplies and most custom formats that you create by using PROC FORMAT.

The SAS formats publishing macro takes a SAS format catalog and publishes it to the Netezza data warehouse. Inside the Netezza data warehouse, a SAS_PUT() function, which emulates the PUT function, is created and registered for use in SQL queries.

Process Flow Diagram

[Process Flow Diagram]

Here is the basic process flow.

[1] Install the components that are necessary for in-database processing in the Netezza data warehouse.

For more information, see Deployed Components for In-Database Processing.

Note:   This is a one-time installation process.  [cautionend]

[2] If you need to, create your custom formats by using PROC FORMAT and use the LIBRARY= option to create a permanent catalog.

For more information, see User-Defined Formats in the Netezza Data Warehouse and the FORMAT procedure in the Base SAS Procedures Guide.

[3] Start SAS 9.2 and run the SAS publishing macros.

For more information, see Publishing SAS Formats.

[4] After the SAS_PUT() function is created, it is available to use in any SQL expression in the same way that Netezza built-in functions are used.

For more information, see Using the SAS_PUT() Function in the Netezza Data Warehouse.


Deployed Components for In-Database Processing

Components that are deployed to Netezza for in-database processing are contained in a self-extracting TAR file on the SAS Software Depot.

The following components are deployed:

For more information about creating the SAS Software Depot, see your Software Order e-mail. For more information about installing and configuring these components, see the SAS In-Database Products: Administrator's Guide.


User-Defined Formats in the Netezza Data Warehouse

You can use PROC FORMAT to create user-defined formats and store them in a format catalog. You can then use the %INDNZ_PUBLISH_FORMATS macro to export the user-defined format definitions to the Netezza data warehouse where the SAS_PUT() function can reference them.

If you use the FMTCAT= option to specify a format catalog in the %INDNZ_PUBLISH_FORMATS macro, these restrictions and limitations apply:


Publishing SAS Formats


Overview of the Publishing Process

The SAS publishing macros are used to publish formats and the SAS_PUT() function in Netezza.

The %INDNZ_PUBLISH_FORMATS macro creates the files that are needed to build the SAS_PUT() function and publishes those files to the Netezza data warehouse.

This macro also registers the formats that are included in the SAS 9.2 Formats Library for Netezza. In addition to registering the formats that SAS supplies, you can publish the PROC FORMAT definitions that are contained in a single SAS format catalog. The process of publishing a PROC FORMAT catalog entry converts the value-range-sets, for example, 1='yes' 2='no', into embedded data in Netezza. For more information on value-range-sets, see PROC FORMAT in the Base SAS Procedures Guide.

The %INDNZ_PUBLISH_FORMATS macro performs the following tasks:


Running the %INDNZ_PUBLISH_FORMATS Macro

To run the %INDNZ_PUBLISH_FORMATS macro, complete the following steps:

  1. Start SAS 9.2 and submit these commands in the Program Editor or Enhanced Editor:

    %indnzpf;
    %let indconn = server=myserver user=myuserid password=XXXX
    database=mydb;

    The %INDNZPF macro is an autocall library that initializes the format publishing software.

    The INDCONN macro variable is used as credentials to connect to Netezza. You must specify the server, user, password, and database to access the machine on which you have installed the Netezza data warehouse. You must assign the INDCONN macro variable before the %INDNZ_PUBLISH_FORMATS macro is invoked.

    Here is the syntax for the value of the INDCONN macro variable:

    SERVER=server USER=userid PASSWORD=password
    DATABASE=database

    Note:   You can use only PASSWORD= or PW= for the password argument. Other aliases such as PASS= or PWD= are not supported and cause errors.  [cautionend]

    Note:   The INDCONN macro variable is not passed as an argument to the %INDNZ_PUBLISH_FORMATS macro. This information can be concealed in your SAS job. You might want to place it in an autoexec file and set the permissions on the file so that others cannot access the user ID and password.  [cautionend]

  2. Run the %INDNZ_PUBLISH_FORMATS macro. For more information, see %INDNZ_PUBLISH_FORMATS Macro Syntax.

    Messages are written to the SAS log that indicate whether the SAS_PUT() function was successfully created.


%INDNZ_PUBLISH_FORMATS Macro Syntax

%INDNZ_PUBLISH_FORMATS (
<, DATABASE=database-name>
<. DBCOMPILE=database-name>
<, DBJAZLIB=database-name>
<, FMTCAT=format-catalog-filename>
<, FMTTABLE=format-table-name>
<, ACTION=CREATE | REPLACE | DROP>
<, OUTDIR=diagnostic-output-directory>
);
Arguments
DATABASE=database-name

specifies the name of a Netezza database to which the SAS_PUT() function and the formats are published. This argument lets you publish the SAS_PUT() function and the formats to a shared database where other users can access them.

Interaction: The database that is specified by the DATABASE= argument takes precedence over the database that you specify in the INDCONN macro variable. For more information, see Running the %INDNZ_PUBLISH_FORMATS Macro.
Tip: It is not necessary that the format definitions and the SAS_PUT() function reside in the same database as the one that contains the data that you want to format. You can use the SQLMAPPUTO= system option to specify the database where the format definitions and the SAS_PUT() function have been published.
DBCOMPILE=database-name

specifies the name of the database where the SAS_COMPILEUDF function was published.

Default: SASLIB
See: For more information about the publishing the SAS_COMPILEUDF function, see the SAS In-Database Products: Administrator's Guide.
DBJAZLIB=database-name

specifies the name of the database where the SAS 9.2 Formats Library for Netezza was published.

Default: SASLIB
Restriction: This argument is supported only on TwinFin systems.
See: For more information about publishing the SAS 9.2 Formats Library on TwinFin systems, see the SAS In-Database Products: Administrator's Guide.
FMTCAT=format-catalog-filename

specifies the name of the format catalog file that contains all user-defined formats that were created with the FORMAT procedure and will be made available in Netezza.

Default: If you do not specify a value for FMTCAT= and you have created user-defined formats in your SAS session, the default is WORK.FORMATS. If you do not specify a value for FMTCAT= and you have not created any user-defined formats in your SAS session, only the formats that SAS supplies are available in Netezza.
Interaction: If the format definitions that you want to publish exist in multiple catalogs, you must copy them into a single catalog for publishing.
Interaction: If you specify more than one format catalog using the FMTCAT argument, only the last catalog specified is published.
Interaction: If you do not use the default catalog name (FORMATS) or the default library (WORK or LIBRARY) when you create user-defined formats, you must use the FMTSEARCH system option to specify the location of the format catalog. For more information, see PROC FORMAT in the Base SAS Procedures Guide.
See Also: User-Defined Formats in the Netezza Data Warehouse
FMTTABLE=format-table-name

specifies the name of the Netezza table that contains all formats that the %INDNZ_PUBLISH_FORMATS macro creates and that the SAS_PUT() function supports. The table contains the columns shown in Table 2.1.

Format Table Columns
Column Name Description
FMTNAME specifies the name of the format.
SOURCE specifies the origin of the format. SOURCE can contain one of these values:
SAS

supplied by SAS

PROCFMT

User-defined with PROC FORMAT

Default: If FMTTABLE is not specified, no table is created. You can see only the SAS_PUT() function. You cannot see the formats that are published by the macro.
Interaction: If ACTION=CREATE or ACTION=DROP is specified, messages are written to the SAS log that indicate the success or failure of the table creation or drop.
ACTION=CREATE | REPLACE | DROP

specifies that the macro performs one of these actions:

CREATE

creates a new SAS_PUT() function.

REPLACE

overwrites the current SAS_PUT() function, if a SAS_PUT() function is already registered or creates a new SAS_PUT() function if one is not registered.

DROP

causes the SAS_PUT() function to be dropped from the Netezza database.

Interaction: If FMTTABLE= is specified, both the SAS_PUT() function and the format table are dropped. If the table name cannot be found or is incorrect, only the SAS_PUT() function is dropped.
Default: CREATE
Tip: If the SAS_PUT() function was published previously and you specify ACTION=CREATE or REPLACE, no warning is issued. Also, if you specify ACTION=DROP and the SAS_PUT() function does not exist, no warning is issued.
OUTDIR=diagnostic-output-directory

specifies a directory that contains diagnostic files.

Files that are produced include an event log that contains detailed information about the success or failure of the publishing process.

See: Special Characters in Directory Names

Tips for Using the %INDNZ_PUBLISH_FORMATS Macro


Special Characters in Directory Names

If the directory names that are used in the macros contain any of the following special characters, you must mask the characters by using the %STR macro quoting function. For more information, see the %STR function and macro string quoting topic in SAS Macro Language: Reference.

Special Characters in Directory Names
Character How to Represent
blank1 %str()
*2 %str(*)
; %str(;)
, (comma) %str(,)
= %str(=)
+ %str(+)
- %str(-)
> %str(>)
< %str(<)
^ %str(^)
| %str(|)
& %str(&)
# %str(#)
/ %str(/)
~ %str(~)
% %str(%%)
' %str(%')
" %str(%")
( %str(%()
) %str(%))
¬ %str(¬)
1 Only leading blanks require the %STR function, but you should avoid using leading blanks in directory names.

2 Asterisks (*) are allowed in UNIX directory names. Asterisks are not allowed in Windows directory names. In general, avoid using asterisks in directory names.

Here are some examples of directory names with special characters:

Examples of Special Characters in Directory Names
Directory Code Representation
c:\temp\Sales(part1)
c:\temp\Sales%str(%()part1%str(%))
c:\temp\Drug "trial" X
c:\temp\Drug %str(%")trial(%str(%") X
c:\temp\Disc's 50% Y
c:\temp\Disc%str(%')s 50%str(%%) Y
c:\temp\Pay,Emp=Z
c:\temp\Pay%str(,)Emp%str(=)Z


Netezza Permissions

You must have permission to create the SAS_PUT() function and formats, and tables in the Netezza database. You must also have permission to execute the SAS_COMPILEUDF, SAS_DIRECTORYUDF, and SAS_HEXTOTEXTUDF functions in either the SASLIB database or the database specified in lieu of SASLIB where these functions are published.

Without these permissions, the publishing of the SAS_PUT() function and formats fail. To obtain these permissions, contact your database administrator.

For more information on specific permissions, see the SAS In-Database Products: Administrator's Guide.


Format Publishing Macro Example

%indnzpf;
%let indconn = server=netezbase user=user1 password=open1 database=mydb;
%indnz_publish_formats(fmtcat= fmtlib.fmtcat);

This sequence of macros generates .c, .cpp, and .h files for each data type. The format data types that are supported are numeric (FLOAT, INT), character, date, time, and timestamp (DATETIME). The %INDNZ_PUBLISH_FORMATS macro also produces a text file of Netezza CREATE FUNCTION commands that are similar to these:

CREATE FUNCTION sas_put(float , varchar(256))
RETURNS VARCHAR(256)
LANGUAGE CPP 
PARAMETER STYLE npsgeneric
CALLED ON NULL INPUT
EXTERNAL CLASS NAME 'Csas_putn'
EXTERNAL HOST OBJECT '/tmp/tempdir_20090528T135753_616784/formal5.o_x86'
EXTERNAL NSPU OBJECT '/tmp/tempdir_20090528T135753_616784/formal5.o_diab_ppc'

After it is installed, you can call the SAS_PUT() function in Netezza by using SQL. For more information, see Using the SAS_PUT() Function in the Netezza Data Warehouse.


Using the SAS_PUT() Function in the Netezza Data Warehouse


Implicit Use of the SAS_PUT() Function

After you install the formats that SAS supplies in libraries inside the Netezza data warehouse and publish any custom format definitions that you created in SAS, you can access the SAS_PUT() function with your SQL queries.

If the SQLMAPPUTTO= system option is set to SAS_PUT and you submit your program from a SAS session, the SAS SQL processor maps PUT function calls to SAS_PUT() function references that Netezza understands.

This example illustrates how the PUT function is mapped to the SAS_PUT() function using implicit pass-through.

options sqlmapputto=sas_put;

%put &mapconn;
 
libname dblib netezza &mapconn;

   /*-- Set SQL debug global options --*/
   /*----------------------------------*/
   options sastrace=',,,d' sastraceloc=saslog;

   /*-- Execute SQL using Implicit Passthru --*/
   /*-----------------------------------------*/
   proc sql noerrorstop;
      title1 'Test SAS_PUT using Implicit Passthru ';
      select distinct
         PUT(PRICE,Dollar8.2) AS PRICE_C
            from dblib.mailorderdemo;
   
   quit;

These lines are written to the SAS log.

options sqlmapputto=sas_put;

%put &mapconn;
user=dbitext password=dbigrp1 server=spubox database=TESTDB
   sql_functions="EXTERNAL_APPEND=WORK.dbfuncext" sql_functions_copy=saslog;
 
libname dblib netezza &mapconn;

NOTE: Libref DBLIB was successfully assigned, as follows:
      Engine:        NETEZZA
      Physical Name: spubox

         /*-- Set SQL debug global options --*/
         /*----------------------------------*/
        options sastrace=',,,d' sastraceloc=saslog;

         /*-- Execute SQL using Implicit Passthru --*/
         /*-----------------------------------------*/
         proc sql noerrorstop;
           title1 'Test SAS_PUT using Implicit Passthru ';
           select distinct
              PUT(PRICE,Dollar8.2) AS PRICE_C
            from dblib.mailorderdemo
                ;
NETEZZA: AUTOCOMMIT is NO for connection 1
NETEZZA: AUTOCOMMIT turned ON for connection id 1
 
NETEZZA_1: Prepared: on connection 1
SELECT * FROM mailorderdemo
 
NETEZZA: AUTOCOMMIT is NO for connection 2
NETEZZA: AUTOCOMMIT turned ON for connection id 2
 
NETEZZA_2: Prepared: on connection 2
 select distinct cast(sas_put(mailorderdemo."PRICE", 'DOLLAR8.2') as char(8)) 
        as PRICE_C from mailorderdemo
 
NETEZZA_3: Executed: on connection 2
Prepared statement NETEZZA_2
 
ACCESS ENGINE:  SQL statement was passed to the DBMS for fetching data.

               Test SAS_PUT using Implicit Passthru                        9
                                                13:42 Thursday, May 7, 2009

                             PRICE_C
                             _______
                              $10.00
                              $12.00
                              $13.59
                              $48.99
                              $54.00
                               $8.00
                              $14.00
                              $27.98
                              $13.99

           quit;

Be aware of these items:

The SELECT DISTINCT clause executes inside Netezza, and the processing is distributed across all available data nodes. Netezza formats the price values with the $DOLLAR8.2 format and processes the SELECT DISTINCT clause using the formatted values.


Explicit Use of the SAS_PUT() Function

If you use explicit pass-through (direct connection to Netezza), you can use the SAS_PUT() function call in your SQL program.

This example shows the same query from Implicit Use of the SAS_PUT() Function and explicitly uses the SAS_PUT() function call.

options sqlmapputto=sas_put sastrace=',,,d' sastraceloc=saslog;

proc sql noerrorstop;
   title1 'Test SAS_PUT using Explicit Passthru';
   connect to netezza (user=dbitest password=XXXXXXX database=testdb
      server=spubox);

   select * from connection to netezza
      (select distinct cast(sas_put("PRICE",'DOLLAR8.2') as char(8)) as
         "PRICE_C" from mailorderdemo);

disconnect from netezza;
quit;

The following lines are written to the SAS log.

options sqlmapputto=sas_put sastrace=',,,d' sastraceloc=saslog;

proc sql noerrorstop;
 title1 'Test SAS_PUT using Explicit Passthru';
 connect to netezza (user=dbitest password=XXXXXXX database=testdb server=spubox);

 select * from connection to netezza
    (select distinct cast(sas_put("PRICE",'DOLLAR8.2') as char(8)) as
        "PRICE_C" from mailorderdemo);

            Test SAS_PUT using Explicit Passthru            2
                                            17:13 Thursday, May 7, 2009

                          PRICE_C
                          _______
                           $27.98
                           $10.00
                           $12.00
                           $13.59
                           $48.99
                           $54.00
                           $13.98
                            $8.00
                           $14.00

 disconnect from netezza;
 quit;

Note:   If you explicitly use the SAS_PUT() function in your code, it is recommended that you use double quotation marks around a column name to avoid any ambiguity with the keywords. For example, if you did not use double quotation marks around the column name, DATE, in this example, all date values would be returned as today's date.

select distinct
   cast(sas_put("price", 'dollar8.2') as char(8)) as "price_c",
   cast(sas_put("date", 'date9.1') as char(9)) as "date_d",
   cast(sas_put("inv", 'best8.') as char(8))  as "inv_n",
   cast(sas_put("name", '$32.') as char(32)) as "name_n"
from mailorderdemo;

  [cautionend]


Determining Format Publish Dates

You might need to know when user-defined formats or formats that SAS supplies were published. SAS supplies two special formats that return a datetime value that indicates when this occurred.

Note:   You must use the SQL pass-through facility to return the datetime value associated with the INTRINSIC-CRDATE and UFMT-CRDATE formats, as illustrated in this example.

proc sql noerrorstop;
   connect to &netezza (&connopt);

title 'Publish date of SAS Format Library';
select * from connection to &netezza
   (
      select sas_put(1, 'intrinsic-crdate.')
         as sas_fmts_datetime;
   );
title 'Publish date of user-defined formats';
select * from connection to &netezza
   (
      select sas_put(1, 'ufmt-crdate.')
         as my_formats_datetime;
   );

disconnect from netezza;
quit;

  [cautionend]

Previous Page | Next Page | Top of Page