When you use server
dynamic clusters, you can create huge tables. If a SAS job needs to
manipulate a huge table, you can sort the tables for more efficient
processing. Traditional processing of huge tables can overuse or overwhelm
available resources. Insufficient run-time or processor resources
can prohibit you from running full table scans and manipulating table
rows, which are required to sort huge tables for subsequent processing.
The server provides
dynamic cluster BY clause optimization to reduce the need for a large
amount of processor resources when evaluating BY clauses. Dynamic
cluster BY clause optimization uses the server to join individually
created server member tables so that the tables appear to be a single
table. But the individual member tables are also kept intact. Dynamic
cluster BY clause optimization uses the SORT attribute of the member
tables to bypass most of the sorting that is required to perform the
implicit BY clause ordering. The server uses the SORT attribute of
each member table to merge the member tables in the dynamic cluster
in order by each member table's order. No additional server workspace
is required, and the ordered table rows are returned quickly because
the member tables do not need to be sorted.
To use dynamic cluster BY clause optimization, you need to build the
dynamic cluster table a specific way. All of the member tables in your dynamic cluster table need to be
sorted by the same columns that you use in the BY clause. When you build your dynamic
cluster table from member tables that are presorted by your BY clause columns, your
dynamic cluster table can use the BY clause optimization.
You run a BY clause that matches the SORT attribute column order of the member tables
of the dynamic cluster table. The server processes the BY clause without using sort
workspace and does not experience
first-row latency. The server uses the presorted member tables to perform an instantaneous
interleave. Because dynamic cluster BY clause optimization uses the presorted member
tables, you can perform operations on huge tables that would be impossible to handle
otherwise.
For example, suppose
your system has sufficient CPU, memory, and workspace resources to
sort a 50 GB table in a reasonable amount of time. However, suppose
this system accumulates 50 GB of new data every month. After 12 months,
the table requires 600 GB of storage. The system cannot sort 600 GB
of data to process queries that are based on the previous 12-month
period. To use dynamic cluster BY clause optimization in this situation:
-
Create a dynamic cluster table from the twelve 50 GB member tables. You have a 600
GB dynamic cluster table.
-
Store data for each
successive month in a server member table.
-
Sort each table and add it to the 600 GB dynamic cluster table.
-
Use dynamic cluster BY clause optimization to run SAS steps that use BY clauses on
the 600 GB dynamic cluster table.
For example, you can run a DATA step MERGE statement that uses the dynamic cluster
table as the master source for the MERGE statement. The BY clause from the MERGE statement
triggers the dynamic cluster BY clause optimization. The operation completes in the
time that it takes to interleave the individual member tables. The process uses no
server workspace and does not cause any implicit BY sort delays.
Dynamic cluster BY clause
optimization is triggered when all member tables have an applicable
SORT attribute ordering for the BY clause that is asserted. When the
SORT attribute ordering is strong (validated), the server does not
verify the order of BY columns that are returned from the member table.
When the SORT attribute ordering is weak (such as from a SORTEDBY
assertion in a data set option), the server verifies the order of
BY columns that are returned from the member table. If the server
detects an invalid BY column order, it terminates the BY clause and
displays the following error message:
ERROR: Clustered BY member violates weaksort order during merge.