The RANK Procedure |
In-database processing has several advantages over processing within SAS. These advantages include increased security, reduced network traffic, and the potential for faster processing. Increased security is possible because sensitive data does not have to be extracted from the DBMS. Faster processing is possible because data is manipulated locally, on the DBMS, using high-speed secondary storage devices instead of being transported across a relatively slow network connection, because the DBMS might have more processing resources at its disposal, and because the DBMS might be capable of optimizing a query for execution in a highly parallel and scalable fashion.
In-database processing for PROC RANK supports DB2, Oracle, and Teradata database management systems.
The presence of table statistics might affect the performance of the RANK procedure's in-database processing. If your DBMS is not configured to automatically generate table statistics, then manual generation of table statistics might be necessary to achieve acceptable in-database performance.
Note:
For DB2, generation of table statistics (either automatic or manual) is highly recommended for all but the smallest input tables.
The TIES=CONDENSE option is not supported for the RANK procedure's in-database processing in an Oracle DBMS. If you use this option, it will prevent SQL generation and execution of in-database processing.
If the RANK procedure's input data set is a table or view that resides within a database from which rows would normally be retrieved with the SAS/ACCESS interface to Teradata, then PROC RANK can perform much or all of its work within the DBMS. There are several other factors that determine whether or not such in-database processing can occur. In-database processing will not occur in the following circumstances:
if the RENAME= data set option is specified on the input data set.
if a WHERE statement appears in the context of the RANK procedure or a WHERE= data set option is specified on the input data set, and the WHERE statement or option contains a reference to a SAS function that has no equivalent in the DBMS or a format that has not been installed for use by SAS within the DBMS.
if any variable specified on a BY statement has an associated format. Formatted BY variables are not supported by PROC RANK for in-database processing.
if a FORMAT statement appears within the procedure context and applies to a variable specified on a BY statement, then in-database processing cannot be performed. Formatted BY variables are not supported by RANK for in-database processing. With a DBMS, formats can be associated with variables only if a FORMAT or ATTRIB statement appears within the procedure context.
When PROC RANK can process data within the DBMS, it generates an SQL query. The structure of the SQL query that is generated during an in-database invocation of PROC RANK depends on several factors, including the ranking methods that are used, the number of variables that are ranked, the inclusion of BY and WHERE statements, and the PROC RANK options that are used, such as TIES= and DESCENDING. The SQL query expresses the required calculations and is submitted to the DBMS. The results of this query will either remain as a new table within the DBMS if the output of the RANK procedure is directed there, or it will be returned to SAS. The settings for the MSGLEVEL option and the SQLGENERATION= option determine whether messages will be printed to the SAS log, which indicates whether in-database processing was performed. Generated SQL can be examined by setting the SASTRACE= option. For more information, see the SASTRACE option in SAS/ACCESS for Relational Databases: Reference.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.