SAS
formats are basically mapping functions that change an element of
data from one format to another. For example, some SAS formats change
numeric values to various currency formats or date-and-time formats.
SAS supplies many formats.
You can also use the SAS FORMAT procedure to define custom formats
that replace raw data values with formatted character values. For
example, this PROC FORMAT code creates a custom format called $REGION
that maps ZIP codes to geographic regions.
proc format;
value $region
'02129', '03755', '10005' = 'Northeast'
'27513', '27511', '27705' = 'Southeast'
'92173', '97214', '94105' = 'Pacific';
run;
SAS programs, including in-database procedures, frequently
use both user-defined formats and formats that SAS supplies. Although
they are referenced in numerous ways, using the PUT function in the
SQL procedure is of particular interest for SAS In-Database processing.
The PUT function takes
a format reference and a data item as input and returns a formatted
value. This SQL procedure query uses the PUT function to summarize
sales by region from a table of all customers:
select put(zipcode,$region.) as region,
sum(sales) as sum_sales from sales.customers
group by region;
The SAS SQL processor
knows how to process the PUT function. Currently,
SAS/ACCESS Interface
to the database returns all rows of unformatted data in the SALES.CUSTOMERS
table in the database to the SAS System for processing.
The SAS In-Database
technology deploys, or publishes, the PUT function implementation
to the database as a new function named SAS_PUT().
Similar to any other programming language function, the SAS_PUT() function can take one or more input parameters and return
an output value.
The SAS_PUT() function supports use of SAS formats. You can specify the SAS_PUT() function in SQL queries that SAS submits to the database
in one of two ways:
-
implicitly by enabling SAS to automatically
map PUT function calls to SAS_PUT() function calls
-
explicitly by using the SAS_PUT() function directly in your SAS program
If you used the SAS_PUT() function in the previous SELECT statement,
the database formats the ZIP code values with the $REGION format and
processes the GROUP BY clause using the formatted values.
By publishing the PUT
function implementation to the database as the SAS_PUT() function, you can realize these advantages:
-
You can process the entire SQL
query inside the database, which minimizes data transfer (I/O).
-
The SAS format processing leverages
the scalable architecture of the DBMS.
-
The results are grouped by the
formatted data and are extracted from the database.
Deploying SAS formats
to execute inside a database can enhance performance and exploit the
database’s parallel processing.
Note: SAS formats and the SAS_PUT() functionality is available in DB2, Netezza, and Teradata.