In most instances, using
a BY clause in SAS code submitted to a server table triggers a BY
clause evaluation by the server. This BY clause assertion to the server
might or might not require sorting to produce the ordered rowset that
the BY clause requires. In some cases, a table index can be used to
sort the rows to satisfy a BY clause.
For example, the input
table to a PROC SORT step is sorted in server context (by the associated
LIBNAME proxy). The rows are returned to PROC SORT in BY clause order.
In this case, PROC SORT knows that the data is already ordered, and
writes the data to the output table without sorting it again. Unfortunately,
this approach still must send the data from the LIBNAME proxy to the
SAS client and then back to the LIBNAME proxy. However, you can use
a server SQL pass-through COPY statement to avoid the overhead of
the data round-trip.
The server attempts
to use an index when performing a BY clause. The software looks specifically
for an index that has variables in the order specified in the BY clause.
On the surface, this seems like a good idea: Table row order is already
determined because the keys in the index are ordered. The server reads
the keys in order from the index, and then returns the rows from the
table, based on the row IDs that are stored with the index key values.
Use caution when using
BY clauses on tables that have indexes on their BY columns. Using
the index is not always a good idea. When no suitable index exists
to determine BY clause order, the server uses a parallel table scan
sort that keeps the table row intact with the sort key. The time
required to access a highly random distribution of row IDs (obtained
by using the index) can greatly exceed the time required to sort the
rows from scratch.
When you use a
WHERE clause to filter the rows from a server table with a BY clause to order them in a desired
way, the server handles both the subsetting and the ordering for this request. In
this case, the filtered rows that were qualified by the WHERE clause are fed directly
into a sort step. Feeding the filtered rows into the sort step is part of the parallel
WHERE clause evaluation. The final ordered rowset is the result. In this case, the
previous discussion of index use does not apply. Index use for WHERE clause filtering
is very desirable and greatly improves the filtering performance that feeds into the
sort step. Arbitrarily suppressing index use with a WHERE and BY combination should
be avoided.