In most instances, using
a BY clause in SAS code submitted to an SPD Server table triggers
a BY clause evaluation by SPD Server. This BY clause assertion to
the SPD Server might or might not require sorting to produce the ordered
rowset that the BY clause requires. In some cases, a table index can
be used to sort the rows to satisfy a BY clause.
For example, the input
table to a PROC SORT step is sorted in server context (by the associated
LIBNAME proxy). The rows are returned to PROC SORT in BY clause order.
In this case, PROC SORT knows that the data is already ordered, and
writes the data to the output table without sorting it again. Unfortunately,
this approach still must send the data from the LIBNAME proxy to the
SAS client and then back to the LIBNAME proxy. However, there are
other ways to use an SPD Server SQL pass-through COPY statement to
avoid the overhead of the data round-trip.
SPD Server attempts
to use an index when performing a BY clause. The software looks specifically
for an index that has variables in the order specified in the BY clause.
On the surface this seems like a good idea: table row order is already
determined because the keys in the index are ordered. SPD Server reads
the keys in order from the index, and then returns the rows from the
table, based on the row IDs that are stored with the index key values.
Use caution when using
BY clauses on tables that have indexes on their BY columns. Using
the index is not always a good idea. When no suitable index exists
to determine BY clause order, SPD Server uses a parallel table scan
sort that keeps the table row intact with the sort key. The time
required to access a highly random distribution of row IDs (obtained
by using the index) can greatly exceed the time required to sort the
rows from scratch.
When you use a WHERE
clause to filter the rows from an SPD Server table with a BY clause
to order them in a desired way, SPD Server handles both the subsetting
and the ordering for this request. In this case, the filtered rows
that were qualified by the WHERE clause are fed directly into a sort
step. Feeding the filtered rows into the sort step is part of the
parallel WHERE clause evaluation. The final ordered rowset is the
result. In this case, the previous discussion of index use does not
apply. Index use for WHERE clause filtering is very desirable and
greatly improves the filtering performance that feeds into the sort
step. Arbitrarily suppressing index use with a WHERE and BY combination
should be avoided.