When
you specify a SAS LASR Analytic Server
or SASHDAT
library as an output library, you can specify a
partition key for the table. You can select a column to use from the
Partition by menu.
Partitioning uses the
formatted values of the partition key to group rows that have the same value for the key. All of the
rows that have the
same value for the key are loaded to a single machine in the
cluster. For SAS LASR Analytic Server
libraries, this means that the rows that have the same value for the key are in memory on one
machine. For SASHDAT libraries, all of the rows that have the same value for the key
are written to a
single file block on one machine. (The block is replicated to other machines for
redundancy.) When the
partitioned table is loaded onto a server, the partitioning remains when it is in memory.
If you select a partition key and also specify sort options for columns on the
Column Editor tab, the sort options are passed to the current
engine in an ORDERBY= option. This enhancement applies to SAS LASR Analytic Server
and SASHDAT libraries and can improve performance once the data is in memory.
When you specify a partition key, avoid using a variable that has few unique values.
For example, partitioning by
a flag column that is Boolean results in all rows on two machines because only two
values are available. At the other end of the spectrum, partitioning large tables
by a nearly
unique key results in many partitions that have few rows.
Determining the optimal partition key can be a challenging task. However, as an example,
if you tend to access data based
on a customer ID, then you might improve performance by partitioning the data by customer.