Server-Side Sorting

Overview of Server-Side Sorting

In most instances, using a BY clause in SAS code submitted to an SPD Server table triggers a BY clause evaluation by SPD Server. This BY clause assertion to the SPD Server might or might not require sorting to produce the ordered row set that the BY clause requires. In some cases, a table index can be used to sort the rows to satisfy a BY clause.
For example, the input table to a PROC SORT step is sorted in server context (by the associated LIBNAME proxy). The rows are returned to PROC SORT in BY clause order. In this case, PROC SORT knows that the data is already ordered, and writes the data to the output table without sorting it again. Unfortunately, this approach still must send the data from the LIBNAME proxy to the SAS client and then back to the LIBNAME proxy. However, there are other ways to use an SPD Server SQL Pass-Through COPY statement to avoid the overhead of the data round-trip.
SPD Server attempts to use an index when performing a BY clause. The software looks specifically for an index that has variables in the order specified in the BY clause. On the surface this seems like a good idea: table row order is already determined because the keys in the index are ordered. SPD Server reads the keys in order from the index, and then returns the rows from the table, based on the row IDs that are stored with the index key values.
Use caution when using BY clauses on tables that have indexes on their BY columns. Using the index is not always a good idea. When no suitable index exists to determine BY clause order, SPD Server uses a parallel table scan sort that keeps the table row intact with the sort key. The time required to access a highly random distribution of row IDs (obtained by using the index) can greatly exceed the time required to sort the rows from scratch.
When you use a WHERE clause to filter the rows from an SPD Server table with a BY clause to order them in a desired way, SPD Server handles both the subsetting and the ordering for this request. In this case, the filtered rows that were qualified by the WHERE clause are fed directly into a sort step. Feeding the filtered rows into the sort step is part of the parallel WHERE clause evaluation. The final ordered row set is the result. In this case, the previous discussion of index use does not apply. Index use for WHERE clause filtering is very desirable and greatly improves the filtering performance that feeds into the sort step. Arbitrarily suppressing index use with a WHERE and BY combination should be avoided.

Suppressing the Use of Indexes

Suppress the use of indexes on the BY clause by using the SPDSNIDX=YES macro variable or by asserting the NOINDEX=YES table option. Suppressing the use of the index can significantly improve time required to process a BY clause in SPD Server.

Advantages of Implicit Server Sorts

An exceptional feature is the software's ability to execute ad hoc order-BY queries without pre-sorting the table on the BY variables. Many SAS job streams are structured with code that alternates PROC SORT followed by PROC xxxx invocations, where the PROC SORT step is needed only for the execution of the PROC xxxx step.
When sort order is relevant only to the following step, eliminate the PROC SORT step and just use the BY clause on the PROC xxxx step. This eliminates the extra data transfer (to PROC SORT from SPD Server and then back from PROC SORT to SPD Server) to store the sorted result. Even if SPD Server performs the sort associated with the PROC SORT, there is extra data transfer. The data's round trip from the server to the SAS client and back can impose a substantial time penalty.