Server-Side Sorting

Overview of Server-Side Sorting

In most instances, using a BY clause in SAS code submitted to a server table triggers a BY clause evaluation by the server. This BY clause assertion to the server might or might not require sorting to produce the ordered rowset that the BY clause requires. In some cases, a table index can be used to sort the rows to satisfy a BY clause.
For example, the input table to a PROC SORT step is sorted in server context (by the associated LIBNAME proxy). The rows are returned to PROC SORT in BY clause order. In this case, PROC SORT knows that the data is already ordered, and writes the data to the output table without sorting it again. Unfortunately, this approach still must send the data from the LIBNAME proxy to the SAS client and then back to the LIBNAME proxy. However, you can use a server SQL pass-through COPY statement to avoid the overhead of the data round-trip.
The server attempts to use an index when performing a BY clause. The software looks specifically for an index that has variables in the order specified in the BY clause. On the surface, this seems like a good idea: Table row order is already determined because the keys in the index are ordered. The server reads the keys in order from the index, and then returns the rows from the table, based on the row IDs that are stored with the index key values.
Use caution when using BY clauses on tables that have indexes on their BY columns. Using the index is not always a good idea. When no suitable index exists to determine BY clause order, the server uses a parallel table scan sort that keeps the table row intact with the sort key. The time required to access a highly random distribution of row IDs (obtained by using the index) can greatly exceed the time required to sort the rows from scratch.
When you use a WHERE clause to filter the rows from a server table with a BY clause to order them in a desired way, the server handles both the subsetting and the ordering for this request. In this case, the filtered rows that were qualified by the WHERE clause are fed directly into a sort step. Feeding the filtered rows into the sort step is part of the parallel WHERE clause evaluation. The final ordered rowset is the result. In this case, the previous discussion of index use does not apply. Index use for WHERE clause filtering is very desirable and greatly improves the filtering performance that feeds into the sort step. Arbitrarily suppressing index use with a WHERE and BY combination should be avoided.

Suppressing the Use of Indexes

Suppress the use of indexes on the BY clause by using the SPDSNIDX=YES macro variable or by asserting the NOINDEX=YES table option. Suppressing the use of the index can significantly improve time required to process a BY clause in the server.
Last updated: February 8, 2017