Partitioning Tables :: SAS(R) Visual Analytics 6.2: User's Guide

When you specify a SAS LASR Analytic Server or SAS Data in HDFS library as an output library, you can specify a partition key for the table. You can select a column to use from the Partition by menu.

Partitioning uses the formatted values of the partition key to group rows that have the same value for the key. All of the rows that have the same value for the key are loaded to a single machine in the cluster. For SAS LASR Analytic Server libraries, this means that the rows that have the same value for the key are in memory on one machine. For SAS Data in HDFS libraries, all of the rows that have the same value for the key are written to a single file block on one machine. (The block is replicated to other machines for redundancy.) When the partitioned table is loaded onto a server, the partitioning remains when it is in memory.

If you select a partition key and also specify sort options for columns on the Column Editor tab, the sort options are passed to the engine in an ORDERBY= option. This enhancement applies to SAS LASR Analytic Server and SAS Data in HDFS libraries and can improve performance once the data is in memory.

When you specify a partition key, avoid using a variable that has few unique values. For example, partitioning by a flag column that is Boolean results in all rows on two machines because only two values are available. At the other end of the spectrum, partitioning large tables by a nearly unique key results in many partitions that have few rows.

Determining the optimal partition key can be a challenging task. However, as an example, if you tend to access data based on a customer ID, then you might improve performance by partitioning the data by customer.

Partitioning Tables

See Also