SAS formats are basically
mapping functions that change an element of data from one format to
another. For example, some SAS formats change numeric values to various
currency formats or date-and-time formats.
SAS supplies many formats.
You can also use the SAS FORMAT procedure to define custom formats
that replace raw data values with formatted character values. For
example, this PROC FORMAT code creates a custom format called $REGION
that maps ZIP codes to geographic regions.
proc format;
value $region
'02129', '03755', '10005' = 'Northeast'
'27513', '27511', '27705' = 'Southeast'
'92173', '97214', '94105' = 'Pacific';
run;
SAS programs, including in-database
procedures, frequently use both user-defined formats and formats that
SAS supplies. Although they are referenced in numerous ways, using
the PUT function in the SQL procedure is of particular interest for
SAS In-Database processing.
The PUT function takes
a format reference and a data item as input and returns a formatted
value. This SQL procedure query uses the PUT function to summarize
sales by region from a table of all customers:
select put(zipcode,$region.) as region,
sum(sales) as sum_sales from sales.customers
group by region;
The SAS SQL processor
knows how to process the PUT function. Currently,
SAS/ACCESS Interface
to the database returns all rows of unformatted data in the SALES.CUSTOMERS
table in the database to the SAS System for processing.
The SAS In-Database
technology deploys, or publishes, the PUT function implementation
to the database as a new function named SAS_PUT( ). Similar
to any other programming language function, the SAS_PUT( )
function can take one or more input parameters and return an output
value.
The SAS_PUT( )
function supports use of SAS formats. You can specify the SAS_PUT( )
function in SQL queries that SAS submits to the database in one of
two ways:
-
implicitly by enabling SAS to automatically
map PUT function calls to SAS_PUT( ) function calls
-
explicitly by using the SAS_PUT( )
function directly in your SAS program
If you used the SAS_PUT( )
function in the previous SELECT statement, the database formats the
ZIP code values with the $REGION format. It then processes the GROUP
BY clause using the formatted values.
By publishing the PUT
function implementation to the database as the SAS_PUT( ) function,
you can realize these advantages:
-
You can process the entire SQL
query inside the database, which minimizes data transfer (
I/O).
-
The SAS format processing leverages
the scalable architecture of the DBMS.
-
The results are grouped by the
formatted data and are extracted from the database.
Deploying SAS formats
to execute inside a database can enhance performance and exploit the
database’s parallel processing.
Note: SAS formats and the SAS_PUT( )
functionality is available in Aster
nCluster,
DB2, Greenplum, Netezza, and Teradata.