Sorting DBMS
data can be resource-intensive—whether you use the SORT procedure,
a BY statement, or an ORDER BY clause on a DBMS data source or in
the SQL procedure SELECT statement. Sort data only when it is needed
for your program.
Here are guidelines
for sorting data.
-
If you specify a BY statement in
a DATA or PROC step that references a DBMS data source, it is recommended
for performance reasons that you associate the BY variable (in a DATA
or PROC step) with an indexed DBMS column. If you reference DBMS
data in a SAS program and the program includes a BY statement for
a variable that corresponds to a column in the DBMS table, the
SAS/ACCESS
LIBNAME engine automatically generates an ORDER BY clause for that
variable. The ORDER BY clause causes the DBMS to sort the data before
the DATA or PROC step uses the data in a SAS program. If the DBMS
table is very large, this sorting can adversely affect your performance.
Use a BY variable that is based on an indexed DBMS column in order
to reduce this negative impact.
-
The outermost BY or ORDER BY clause
overrides any embedded BY or ORDER BY clauses, including those specified
by the
DBCONDITION= option,
those specified in a WHERE clause, and those in the selection criteria
in a view descriptor. In the following example, the EXEC_EMPLOYEES
data set includes a BY statement that sorts the data by the variable
SENIORITY. However, when that data set is used in the following PROC
SQL query, the data is ordered by the SALARY column and not by SENIORITY
libname mydblib oracle user=testuser password=testpass;
data exec_employees;
set mydblib.staff (keep=lname fname idnum);
by seniority;
where salary >= 150000;
run;
proc sql;
select * from exec_employees
order by salary;
-
Do not use PROC SORT to sort data
from SAS back into the DBMS because this impedes performance and has
no effect on the order of the data.
-
The database does not guarantee
sort stability when you use PROC SORT. Sort stability means that the
ordering of the observations in the BY statement is exactly the same
every time the sort is run against static data. If you absolutely
require sort stability, you must place your database data into a SAS
data set and use PROC SORT.
-
When you use PROC SORT, be aware
that the sort rules for SAS and for your DBMS might be different.
Use the Base SAS system option SORTPGM to specify which rules (host,
SAS, or DBMS) are applied:
sorts data according
to the DBMS sort rules, the host sort rules, and the SAS
sort rules. (Sorting uses the first available and pertinent sorting
algorithm in this list.) This is the default.
sorts data according
to host rules and then SAS rules. (Sorting uses the first available
and pertinent sorting algorithm in this list.)