The SORT Procedure |
When the DATA= input data set is stored as a table or view in a database management system (DBMS), the PROC SORT procedure can use in-database processing to sort the data. In-database processing can provide the advantages of faster processing and reduced data transfer between the database and SAS software.
PROC SORT performs in-database processing using SQL explicit pass-through. The Pass-Through Facility uses SAS/ACCESS to connect to a DBMS and to send statements directly to the DBMS for execution. This facility lets you use the SQL syntax of your DBMS. For details, see "Pass-Through Facility for Relational Databases" in SAS/ACCESS for Relational Databases: Reference.
In the third maintenance release for SAS 9.2, in-database processing is used by PROC SORT when a combination of procedure and system options are properly set. When system option SORTPGM=BEST, system option SQLGENERATION= is set to cause in-database processing, and when the PROC SORT NODUPKEY option is specified, PROC SORT generates a DBMS SQL query that sorts the data. The sorted results can either remain as a new table within the DBMS or can be returned to SAS. To view the SQL queries generated, set the SASTRACE= option.
The SAS System Option SORTPGM= can also be used without setting the SQLGENERATION option to instruct PROC SORT to use either the DBMS, SAS, or the HOST to perform the sort. If SORTPGM=BEST is specified, then either the DBMS, SAS, or HOST will perform the sort. The observation ordering that is produced by PROC SORT will depend on whether the DBMS or SAS performs the sorting.
If the DBMS performs the sort, then the configuration and characteristics of the DBMS sorting program will affect the resulting data order. The DBMS configuration settings and characteristics that can affect data order include character collation, ordering of NULL values, and sort stability. Most database management systems do not guarantee sort stability, and the sort might be performed by the DBMS regardless of the state of the SORTEQUALS/NOSORTEQUALS system option and EQUALS/NOEQUALS procedure option.
If you set the SAS system option SORTPGM= to SAS, then unordered data is delivered from the DBMS to SAS and SAS performs the sorting. However, consistency in the delivery order of observations from a DBMS is not guaranteed. Therefore, even though SAS can perform a stable sort on the DBMS data, SAS cannot guarantee that the ordering of observations within output BY groups will be the same from one PROC SORT execution to the next. To achieve consistency in the ordering of observations within BY groups, first populate a SAS data set with the DBMS data, then use the EQUALS or SORTEQUALS option to perform a stable sort.
In-database processing is affected by the following circumstances:
When any of the PROC SORT options, NODUPRECS, SORTSEQ=, or DUPOUT=, are specified, no in-database processing occurs.
For in-database processing, the OUT= procedure option must be specified and the output data set cannot refer to the input table on the DBMS.
If the SQLGENERATION system option is set to cause in-database processing and both the NODUPKEY and NODUPRECS procedure options are specified, the NODUPRECS option is ignored and in-database processing occurs.
LIBNAME options and data set options can also affect whether or not in-database processing occurs and what type of query will be generated. See "Overview of In-Database Procedures" in SAS/ACCESS for Relational Databases: Reference for a complete list of these options. The user can also set OPTIONS MSGLEVEL=I in SAS to see which options prevent or affect in-database processing.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.