Performance Considerations |
Sorting DBMS data can be resource-intensive--whether you use the SORT procedure, a BY statement, or an ORDER BY clause on a DBMS data source or in the SQL procedure SELECT statement. Sort data only when it is needed for your program.
Here are guidelines for sorting data.
If you specify a BY statement in a DATA or PROC step that references a DBMS data source, it is recommended for performance reasons that you associate the BY variable (in a DATA or PROC step) with an indexed DBMS column. If you reference DBMS data in a SAS program and the program includes a BY statement for a variable that corresponds to a column in the DBMS table, the SAS/ACCESS LIBNAME engine automatically generates an ORDER BY clause for that variable. The ORDER BY clause causes the DBMS to sort the data before the DATA or PROC step uses the data in a SAS program. If the DBMS table is very large, this sorting can adversely affect your performance. Use a BY variable that is based on an indexed DBMS column in order to reduce this negative impact.
The outermost BY or ORDER BY clause overrides any embedded BY or ORDER BY clauses, including those specified by the DBCONDITION= option, those specified in a WHERE clause, and those in the selection criteria in a view descriptor. In the following example, the EXEC_EMPLOYEES data set includes a BY statement that sorts the data by the variable SENIORITY. However, when that data set is used in the following PROC SQL query, the data is ordered by the SALARY column and not by SENIORITY.
libname mydblib oracle user=testuser password=testpass; data exec_employees; set mydblib.staff (keep=lname fname idnum); by seniority; where salary >= 150000; run; proc sql; select * from exec_employees order by salary;
Do not use PROC SORT to sort data from SAS back into the DBMS because this impedes performance and has no effect on the order of the data.
The database does not guarantee sort stability when you use PROC SORT. Sort stability means that the ordering of the observations in the BY statement is exactly the same every time the sort is run against static data. If you absolutely require sort stability, you must place your database data into a SAS data set, and then use PROC SORT.
When you use PROC SORT, be aware that the sort rules for SAS and for your DBMS might be different. Use the Base SAS system option SORTPGM to specify which rules (host, SAS, or DBMS) are applied:
sorts data according to the DBMS sort rules, then the host sort rules, and then the SAS sort rules. (Sorting uses the first available and pertinent sorting algorithm in this list.) This is the default.
sorts data according to host rules and then SAS rules. (Sorting uses the first available and pertinent sorting algorithm in this list.)
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.