Previous Page | Next Page

SAS/ACCESS Interface to Teradata

Deploying and Using SAS Formats in Teradata


Using SAS Formats

SAS formats are basically mapping functions that change an element of data from one format to another. For example, some SAS formats change numeric values to various currency formats or date-and-time formats.

SAS supplies many formats. You can also use the SAS FORMAT procedure to define custom formats that replace raw data values with formatted character values. For example, this PROC FORMAT code creates a custom format called $REGION that maps ZIP codes to geographic regions.

proc format;
   value $region
   '02129', '03755', '10005' = 'Northeast'
   '27513', '27511', '27705' = 'Southeast'
   '92173', '97214', '94105' = 'Pacific';
run;

SAS programs, including in-database procedures, frequently use both user-defined formats and formats that SAS supplies. Although they are referenced in numerous ways, using the PUT function in the SQL procedure is of particular interest for SAS In-Database processing.

The PUT function takes a format reference and a data item as input and returns a formatted value. This SQL procedure query uses the PUT function to summarize sales by region from a table of all customers:

select put(zipcode,$region.) as region,
   sum(sales) as sum_sales from sales.customers
   group by region;

The SAS SQL processor knows how to process the PUT function. Currently, SAS/ACCESS Interface to Teradata returns all rows of unformatted data in the SALES.CUSTOMERS table in the Teradata database to the SAS System for processing.

The SAS In-Database technology deploys, or publishes, the PUT function implementation to Teradata as a new function named SAS_PUT(). Similar to any other programming language function, the SAS_PUT() function can take one or more input parameters and return an output value.

The SAS_PUT() function supports use of SAS formats. You can specify the SAS_PUT() function in SQL queries that SAS submits to Teradata in one of two ways:

If you used the SAS_PUT() function in the previous example, Teradata formats the ZIP code values with the $REGION format and processes the GROUP BY clause using the formatted values.

By publishing the PUT function implementation to Teradata as the SAS_PUT() function, you can realize these advantages:

Deploying SAS formats to execute inside a Teradata database can enhance performance and exploit Teradata parallel processing.


How It Works

By using the SAS formats publishing macro, you can generate a SAS_PUT() function that enables you to execute PUT function calls inside the Teradata EDW. You can reference the formats that SAS supplies and most custom formats that you create by using PROC FORMAT.

The SAS formats publishing macro takes a SAS format catalog and publishes it to the Teradata EDW. Inside the Teradata EDW, a SAS_PUT() function, which emulates the PUT function, is created and registered for use in SQL queries.

Process Flow Diagram

[Process Flow Diagram]

Here is the basic process flow.

[1] Install the components that are necessary for in-database processing in the Teradata EDW.

Note:   This is a one-time installation process.  [cautionend]

For more information, see Deployed Components for In-Database Processing.

[2] If necessary, create your custom formats by using PROC FORMAT and create a permanent catalog by using the LIBRARY= option.

For more information, see User-Defined Formats in the Teradata EDW and the FORMAT procedure in the Base SAS Procedures Guide.

[3] Start SAS 9.2 and run the %INDTD_PUBLISH_FORMATS macro. This macro creates the files that are needed to build the SAS_PUT() function and publishes those files to the Teradata EDW.

For more information, see Publishing SAS Formats.

[4] After the %INDTD_PUBLISH_FORMATS macro creates the script, SAS/ACCESS Interface to Teradata executes the script and publishes the files to the Teradata EDW.

For more information, see Publishing SAS Formats.

[5] Teradata compiles the .c and .h files and creates the SAS_PUT() function. The SAS_PUT() function is available to use in any SQL expression and to use typically wherever you use Teradata built-in functions.

For more information, see Using the SAS_PUT() Function in the Teradata EDW.


Deployed Components for In-Database Processing

Components that are deployed to Teradata for in-database processing are contained in either an RPM file (Linux) or a PKG file (MP-RAS) in the SAS Software Depot.

The component that is deployed is the SAS 9.2 Formats Library for Teradata. The SAS 9.2 Formats Library for Teradata contains many of the formats that are available in Base SAS. After you install the SAS 9.2 Formats Library and run the %INDTD_PUBLISH_FORMATS macro, the SAS_PUT() function can call these formats.

Note:   The SAS Scoring Accelerator for Teradata also uses these libraries. For more information about this product, see the SAS Scoring Accelerator for Teradata: User's Guide.  [cautionend]

For more information about creating the SAS Software Depot, see the instructions in your Software Order e-mail. For more information about installing and configuring these components, see the SAS In-Database Products: Administrator's Guide.


User-Defined Formats in the Teradata EDW

You can use PROC FORMAT to create user-defined formats and store them in a format catalog. You can then use the %INDTD_PUBLISH_FORMATS macro to export the user-defined format definitions to the Teradata EDW where the SAS_PUT() function can reference them.

If you use the FMTCAT= option to specify a format catalog in the %INDTD_PUBLISH_FORMATS macro, these restrictions and limitations apply:


Data Types and the SAS_PUT() Function

The SAS_PUT() function supports direct use of the Teradata data types shown in Teradata Data Types Supported by the SAS_PUT() Function. In some cases, the Teradata database performs an implicit conversion of the input data to the match the input data type that is defined for the SAS_PUT() function. For example, all compatible numeric data types are implicitly converted to FLOAT before they are processed by the SAS_PUT() function.

Teradata Data Types Supported by the SAS_PUT() Function
Type of Data Data Type
Numeric BYTEINT

SMALLINT

INTEGER

BIGINT1

DECIMAL (ANSI NUMERIC)1

FLOAT (ANSI REAL or DOUBLE PRECISION)

Date and time DATE

TIME

TIMESTAMP

Character2, 3 CHARACTER4

VARCHAR

LONG VARCHAR

1 Numeric precision might be lost when inputs are implicitly converted to FLOAT before they are processed by the SAS_PUT() function.

2 Only the Latin-1 character set is supported for character data. UNICODE is not supported at this time.

3 When character inputs are larger than 256 characters, the results depend on the session mode associated with the Teradata connection.

  • In ANSI session mode (the typical SAS default mode) passing a character field larger than 256 results in a string truncation error.
  • In Teradata session mode, character inputs larger than 256 characters are silently truncated to 256 characters before the format is applied. The SAS/STAT procedures that have been enhanced for in-database processing use the Teradata session mode.

4 The SAS_PUT() function has a VARCHAR data type for its first argument when the value passed has a data type of CHARACTER. Therefore, columns with a data type of CHARACTER have their trailing blanks trimmed when converting to a VARCHAR data type.

The SAS_PUT() function does not support direct use of the Teradata data types shown in Teradata Data Types not Supported by the SAS_PUT() Function. In some cases, unsupported data types can be explicitly converted to a supported type by using SAS or SQL language constructs. For information about performing explicit data conversions, see Data Types for Teradata and your Teradata documentation.

Teradata Data Types not Supported by the SAS_PUT() Function
Type of Data Data Type
ANSI date and time INTERVAL

TIME WITH TIME ZONE

TIMESTAMP WITH TIME ZONE

GRAPHIC server character set GRAPHIC

VARGRAPHIC

LONG VARGRAPHIC

Binary and large object CLOB

BYTE

VARBYTE

BLOB

If an incompatible data type is passed to the SAS_PUT() function, various error messages can appear in the SAS log including these:


Publishing SAS Formats


Overview of the Publishing Process

The SAS publishing macros are used to publish formats and the SAS_PUT() function in the Teradata EDW.

The %INDTD_PUBLISH_FORMATS macro creates the files that are needed to build the SAS_PUT() function and publishes these files to the Teradata EDW.

The %INDTD_PUBLISH_FORMATS macro also registers the formats that are included in the SAS 9.2 Formats Library for Teradata. This makes many formats that SAS supplies available inside Teradata. For more information about the SAS 9.2 Formats Library for Teradata, see Deployed Components for In-Database Processing.

In addition to formats that SAS supplies, you can also publish the PROC FORMAT definitions that are contained in a single SAS format catalog by using the FMTCAT= option. The process of publishing a PROC FORMAT catalog entry converts the value-range-sets, for example, 1='yes' 2='no', into embedded data in Teradata. For more information on value-range-sets, see PROC FORMAT in the Base SAS Procedures Guide.

Note:   If you specify more than one format catalog using the FMTCAT= option, the last format that you specify is published.  [cautionend]

The %INDTD_PUBLISH_FORMATS macro performs the following tasks:


Running the %INDTD_PUBLISH_FORMATS Macro

Follow these steps to run the %INDTD_PUBLISH_FORMATS macro.

  1. Start SAS 9.2 and submit these commands in the Program Editor or Enhanced Editor:

    %indtdpf;
    %let indconn = server="myserver" user="myuserid" password="xxxx" 
       database="mydb";

    The %INDTDPF macro is an autocall library that initializes the format publishing software.

    The INDCONN macro variable is used as credentials to connect to Teradata. You must specify the server, user, password, and database information to access the machine on which you have installed the Teradata EDW. You must assign the INDCONN macro variable before the %INDTD_PUBLISH_FORMATS macro is invoked.

    Here is the syntax for the value of the INDCONN macro variable:

    SERVER="server" USER="userid" PASSWORD="password" DATABASE="database"

    Note:   The INDCONN macro variable is not passed as an argument to the %INDTD_PUBLISH_FORMATS macro. Consequently, this information can be concealed in your SAS job. You might want to place it in an autoexec file and set the permissions on the file so that others cannot access the user ID and password.  [cautionend]

  2. Run the %INDTD_PUBLISH_FORMATS macro. For more information, see %INDTD_PUBLISH_FORMATS Macro Syntax.

    Messages are written to the SAS log that indicate whether the SAS_PUT() function was successfully created.

Note:   USER librefs that are not assigned to WORK might cause unexpected or unsuccessful behavior.  [cautionend]


%INDTD_PUBLISH_FORMATS Macro Syntax

%INDTD_PUBLISH_FORMATS (
<DATABASE=database-name>
<, FMTCAT=format-catalog-filename>
<, FMTTABLE=format-table-name>
<, ACTION=CREATE | REPLACE | DROP>
<, MODE=PROTECTED | UNPROTECTED>
<, OUTDIR=diagnostic-output-directory>
);
Arguments

DATABASE=database-name

specifies the name of a Teradata database to which the SAS_PUT() function and the formats are published. This argument lets you publish the SAS_PUT() function and the formats to a shared database where other users can access them.

Interaction: The database that is specified by the DATABASE= argument takes precedence over the database that you specify in the INDCONN macro variable. For more information, see Running the %INDTD_PUBLISH_FORMATS Macro.
Tip: It is not necessary that the format definitions and the SAS_PUT() function reside in the same database as the one that contains the data that you want to format. You can use the SQLMAPPUTTO= system option to specify where the format definitions and the SAS_PUT() function are published. For more information, see SQLMAPPUTTO= System Option.
FMTCAT=format-catalog-filename

specifies the name of the format catalog file that contains all user-defined formats that were created with the FORMAT procedure and will be made available in Teradata.

Default: If you do not specify a value for FMTCAT= and you have created user-defined formats in your SAS session, the default is WORK.FORMATS. If you do not specify a value for FMTCAT= and you have not created any user-defined formats in your SAS session, only the formats that SAS supplies are available in Teradata.
Interaction: If the format definitions that you want to publish exist in multiple catalogs, you must copy them into a single catalog for publishing.
Interaction: If you specify more than one format catalog using the FMTCAT argument, only the last catalog you specify is published.
Interaction: If you do not use the default catalog name (FORMATS) or the default library (WORK or LIBRARY) when you create user-defined formats, you must use the FMTSEARCH system option to specify the location of the format catalog. For more information, see PROC FORMAT in the Base SAS Procedures Guide.
See Also: User-Defined Formats in the Teradata EDW
FMTTABLE=format-table-name

specifies the name of the Teradata table that contains all formats that the %INDTD_PUBLISH_FORMATS macro creates and that the SAS_PUT() function supports. The table contains the columns in Format Table Columns.

Format Table Columns
Column Name Description
FMTNAME specifies the name of the format.
SOURCE specifies the origin of the format. SOURCE can contain one of these values:
SAS

supplied by SAS

PROCFMT

User-defined with PROC FORMAT

PROTECTED specifies whether the format is protected. PROTECTED can contain one of these values:
YES

Format was created with the MODE= option set to PROTECTED.

NO

Format was created with the MODE= option set to UNPROTECTED.

Default: If FMTTABLE is not specified, no table is created. You can see only the SAS_PUT() function. You cannot see the formats that are published by the macro.
Interaction: If ACTION=CREATE or ACTION=DROP is specified, messages are written to the SAS log that indicate the success or failure of the table creation or drop.
ACTION=CREATE | REPLACE | DROP

specifies that the macro performs one of these actions:

CREATE

creates a new SAS_PUT() function.

REPLACE

overwrites the current SAS_PUT() function, if a SAS_PUT() function is already registered or creates a new SAS_PUT() function if one is not registered.

DROP

causes the SAS_PUT() function to be dropped from the Teradata database.

Interaction: If FMTTABLE= is specified, both the SAS_PUT() function and the format table are dropped. If the table name cannot be found or is incorrect, only the SAS_PUT() function is dropped.
Default: CREATE.
Tip: If the SAS_PUT() function was defined previously and you specify ACTION=CREATE, you receive warning messages from Teradata. If the SAS_PUT() function was defined previously and you specify ACTION=REPLACE, a message is written to the SAS log indicating that the SAS_PUT() function has been replaced.
MODE=PROTECTED | UNPROTECTED

specifies whether the running code is isolated in a separate process in the Teradata database so that a program fault does not cause the database to stop.

Default: PROTECTED
Tip: Once the SAS formats are validated in PROTECTED mode, you can republish them in UNPROTECTED mode for a performance gain.
OUTDIR=diagnostic-output-directory

specifies a directory that contains diagnostic files.

Files that are produced include an event log that contains detailed information about the success or failure of the publishing process.

See: Special Characters in Directory Names

Tips for Using the %INDTD_PUBLISH_FORMATS Macro


Modes of Operation

There are two modes of operation when executing the %INDTD_PUBLISH_FORMATS macro: protected and unprotected. You specify the mode by setting the MODE= argument.

The default mode of operation is protected. Protected mode means that the macro code is isolated in a separate process in the Teradata database, and an error does not cause the database to stop. It is recommended that you run the %INDTD_PUBLISH_FORMATS macro in protected mode during acceptance tests.

When the %INDTD_PUBLISH_FORMATS macro is ready for production, you can rerun the macro in unprotected mode. Note that you could see a performance advantage when you republish the formats in unprotected mode


Special Characters in Directory Names

If the directory names that are used in the macros contain any of the following special characters, you must mask the characters by using the %STR macro quoting function. For more information, see the %STR function and macro string quoting topic in SAS Macro Language: Reference.

Special Characters in Directory Names
Character How to Represent
blank1 %str()
*2 %str(*)
; %str(;)
, (comma) %str(,)
= %str(=)
+ %str(+)
- %str(-)
> %str(>)
< %str(<)
^ %str(^)
| %str(|)
& %str(&)
# %str(#)
/ %str(/)
~ %str(~)
% %str(%%)
' %str(%')
" %str(%")
( %str(%()
) %str(%))
¬ %str(¬)
1 Only leading blanks require the %STR function, but you should avoid using leading blanks in directory names.

2 Asterisks (*) are allowed in UNIX directory names. Asterisks are not allowed in Windows directory names. In general, avoid using asterisks in directory names.

Here are some examples of directory names with special characters:

Examples of Special Characters in Directory Names
Directory Code Representation
c:\temp\Sales(part1)
c:\temp\Sales%str(%()part1%str(%))
c:\temp\Drug "trial" X
c:\temp\Drug %str(%")trial(%str(%") X
c:\temp\Disc's 50% Y
c:\temp\Disc%str(%')s 50%str(%%) Y
c:\temp\Pay,Emp=Z
c:\temp\Pay%str(,)Emp%str(=)Z


Teradata Permissions

Because functions are associated with a database, the functions inherit the access rights of that database. It could be useful to create a separate shared database for scoring functions so that access rights can be customized as needed. In addition, to publish the scoring functions in Teradata, you must have the following permissions:

CREATE FUNCTION

DROP FUNCTION

EXECUTE FUNCTION

ALTER FUNCTION

To obtain permissions, contact your database administrator.


Format Publishing Macro Example

%indtdpf;
%let indconn server="terabase" user="user1" password="open1" database="mydb";
%indtd_publish_formats(fmtcat= fmtlib.fmtcat);

This sequence of macros generates a .c and a .h file for each data type. The format data types that are supported are numeric (FLOAT, INT), character, date, time, and timestamp (DATETIME). The %INDTD_PUBLISH_FORMATS macro also produces a text file of Teradata CREATE FUNCTION commands that are similar to these:

CREATE FUNCTION sas_put
(d float, f varchar(64))
RETURNS varchar(256)
SPECIFIC sas_putn
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
NOT DETERMINISTIC
CALLED ON NULL INPUT
EXTERNAL NAME
'SL!"jazxfbrs"'
'!CI!ufmt!C:\file-path\'
'!CI!jazz!C:\file-path\'
'!CS!formn!C:\file-path\';

After it is installed, you can call the SAS_PUT() function in Teradata by using SQL. For more information, see Using the SAS_PUT() Function in the Teradata EDW.


Using the SAS_PUT() Function in the Teradata EDW


Implicit Use of the SAS_PUT() Function

After you install the formats that SAS supplies in libraries inside the Teradata EDW and publish any custom format definitions that you created in SAS, you can access the SAS_PUT() function with your SQL queries.

If the SQLMAPPUTTO= system option is set to SAS_PUT and you submit your program from a SAS session, the SAS SQL processor maps PUT function calls to SAS_PUT() function references that Teradata understands.

This example illustrates how the PUT function is mapped to the SAS_PUT() function using implicit pass-through.

options sqlmapputto=sas_put;

libname dblib teradata user="sas" password="sas" server="sl96208"
   database=sas connection=shared;

   /*-- Set SQL debug global options --*/
   /*----------------------------------*/
   options sastrace=',,,d' sastraceloc=saslog;

   /*-- Execute SQL using Implicit Passthru --*/
   /*-----------------------------------------*/
   proc sql noerrorstop;
      title1 'Test SAS_PUT using Implicit Passthru ';
      select distinct
         PUT(PRICE,Dollar8.2) AS PRICE_C
            from dblib.mailorderdemo;
   quit;

These lines are written to the SAS log.

libname dblib teradata user="sas" password="sas" server="sl96208"
   database=sas connection=shared;

NOTE: Libref DBLIB was successfully assigned, as follows:
      Engine:        TERADATA
      Physical Name: sl96208

         /*-- Set SQL debug global options --*/
         /*----------------------------------*/
         options sastrace=',,,d' sastraceloc=saslog;

         /*-- Execute SQL using Implicit Passthru --*/
         /*-----------------------------------------*/
         proc sql noerrorstop;
           title1 'Test SAS_PUT using Implicit Passthru ';
           select distinct
              PUT(PRICE,Dollar8.2) AS PRICE_C
            from dblib.mailorderdemo
                ;

TERADATA_0: Prepared: on connection 0
SELECT * FROM sas."mailorderdemo"


TERADATA_1: Prepared: on connection 0
 select distinct cast(sas_put("sas"."mailorderdemo"."PRICE", 'DOLLAR8.2') 
 as char(8)) as "PRICE_C" from "sas"."mailorderdemo"

TERADATA: trforc: COMMIT WORK
ACCESS ENGINE:  SQL statement was passed to the DBMS for fetching data.

TERADATA_2: Executed: on connection 0
 select distinct cast(sas_put("sas"."mailorderdemo"."PRICE", 'DOLLAR8.2') 
 as char(8)) as "PRICE_C" from "sas"."mailorderdemo"

TERADATA: trget - rows to fetch: 9
TERADATA: trforc: COMMIT WORK

               Test SAS_PUT using Implicit Passthru                        9
                                        3:42 Thursday, September 25, 2008

                             PRICE_C
                             _______
                               $8.00
                              $10.00
                              $12.00
                              $13.59
                              $13.99
                              $14.00
                              $27.98
                              $48.99
                              $54.00

         quit;

Be aware of these items:

The SELECT DISTINCT clause executes inside Teradata, and the processing is distributed across all available data nodes. Teradata formats the price values with the $DOLLAR8.2 format and processes the SELECT DISTINCT clause using the formatted values.


Explicit Use of the SAS_PUT() Function

If you use explicit pass-through (direct connection to Teradata), you can use the SAS_PUT() function call in your SQL program.

This example shows the same query from Implicit Use of the SAS_PUT() Function and explicitly uses the SAS_PUT() function call.

proc sql noerrorstop;
   title1 'Test SAS_PUT using Explicit Passthru;
   connect to teradata (user=sas password=XXX database=sas server=sl96208);

   select * from connection to teradata
      (select distinct cast(sas_put("PRICE",'DOLLAR8.2') as char(8)) as
         "PRICE_C" from mailorderdemo);

disconnect from teradata;
quit;

The following lines are written to the SAS log.

 proc sql noerrorstop;
 title1 'Test SAS_PUT using Explicit Passthru ';
 connect to teradata (user=sas password=XXX database=sas server=sl96208);

 select * from connection to teradata
    (select distinct cast(sas_put("PRICE",'DOLLAR8.2') as char(8)) as
        "PRICE_C" from mailorderdemo);

            Test SAS_PUT using Explicit Passthru                        10
                                        13:42 Thursday, September 25, 2008

                          PRICE_C
                          _______
                            $8.00
                           $10.00
                           $12.00
                           $13.59
                           $13.99
                           $14.00
                           $27.98
                           $48.99
                           $54.00

disconnect from teradata; 
quit;

Note:   If you explicitly use the SAS_PUT() function in your code, it is recommended that you use double quotation marks around a column name to avoid any ambiguity with the keywords. For example, if you did not use double quotation marks around the column name, DATE, in this example, all date values would be returned as today's date.

select distinct
   cast(sas_put("price", 'dollar8.2') as char(8)) as "price_c",
   cast(sas_put("date", 'date9.1') as char(9)) as "date_d",
   cast(sas_put("inv", 'best8.') as char(8))  as "inv_n",
   cast(sas_put("name", '$32.') as char(32)) as "name_n"
from mailorderdemo;

  [cautionend]


Tips When Using the SAS_PUT() Function


Determining Format Publish Dates

You might need to know when user-defined formats or formats that SAS supplies were published. SAS supplies two special formats that return a datetime value that indicates when this occurred.

Note:   You must use the SQL pass-through facility to return the datetime value associated with the INTRINSIC-CRDATE and UFMT-CRDATE formats, as illustrated in this example:

proc sql noerrorstop;
   connect to &tera (&connopt);

title 'Publish date of SAS Format Library';
select * from connection to &tera
   (
      select sas_put(1, 'intrinsic-crdate.')
         as sas_fmts_datetime;
   );
title 'Publish date of user-defined formats';
select * from connection to &tera
   (
      select sas_put(1, 'ufmt-crdate.')
         as my_formats_datetime;
   );

disconnect from teradata;
quit;

  [cautionend]


Using the SAS_PUT() Function with SAS Web Report Studio

By default, SAS Web Report Studio uses a large query cache to improve performance. When this query cache builds a query, it removes any PUT functions before sending the query to the database.

In the third maintenance release for SAS 9.2, SAS Web Report Studio can run queries with PUT functions and map those PUT functions calls to SAS_PUT() function calls inside the Teradata EDW. To do this, you set the SAS_PUT custom property for a SAS Information Map that is used as a data source. The SAS_PUT custom property controls how SAS Web Report Studio uses the query cache and whether the PUT function calls are processed inside the Teradata EDW as SAS_PUT() function calls.

For more information about the SAS_PUT custom property, see the SAS Intelligence Platform: Web Application Administration Guide.

Previous Page | Next Page | Top of Page