Using SAS Formats and the SAS_PUT( ) Function

SAS formats are basically mapping functions that change an element of data from one format to another. For example, some SAS formats change numeric values to various currency formats or date-and-time formats.
SAS supplies many formats. You can also use the SAS FORMAT procedure to define custom formats that replace raw data values with formatted character values. For example, this PROC FORMAT code creates a custom format called $REGION that maps ZIP codes to geographic regions.
proc format;
   value $region
   '02129', '03755', '10005' = 'Northeast'
   '27513', '27511', '27705' = 'Southeast'
   '92173', '97214', '94105' = 'Pacific';
run;
SAS programs, including in-database procedures, frequently use both user-defined formats and formats that SAS supplies. Although they are referenced in numerous ways, using the PUT function in the SQL procedure is of particular interest for SAS In-Database processing.
The PUT function takes a format reference and a data item as input and returns a formatted value. This SQL procedure query uses the PUT function to summarize sales by region from a table of all customers:
select put(zipcode,$region.) as region,
   sum(sales) as sum_sales from sales.customers
   group by region;
The SAS SQL processor knows how to process the PUT function. Currently, SAS/ACCESS Interface to the database returns all rows of unformatted data in the SALES.CUSTOMERS table in the database to the SAS System for processing.
The SAS In-Database technology deploys, or publishes, the PUT function implementation to the database as a new function named SAS_PUT( ). Similar to any other programming language function, the SAS_PUT( ) function can take one or more input parameters and return an output value.
The SAS_PUT( ) function supports use of SAS formats. You can specify the SAS_PUT( ) function in SQL queries that SAS submits to the database in one of two ways:
  • implicitly by enabling SAS to automatically map PUT function calls to SAS_PUT( ) function calls
  • explicitly by using the SAS_PUT( ) function directly in your SAS program
If you used the SAS_PUT( ) function in the previous SELECT statement, the database formats the ZIP code values with the $REGION format. It then processes the GROUP BY clause using the formatted values.
By publishing the PUT function implementation to the database as the SAS_PUT( ) function, you can realize these advantages:
  • You can process the entire SQL query inside the database, which minimizes data transfer (I/O).
  • The SAS format processing leverages the scalable architecture of the DBMS.
  • The results are grouped by the formatted data and are extracted from the database.
Deploying SAS formats to execute inside a database can enhance performance and exploit the database’s parallel processing.
Note: SAS formats and the SAS_PUT( ) functionality is available in Aster nCluster, DB2, Greenplum, Netezza, and Teradata.