PARTSIZE= Table Option

Specifies the size of the table partitions.

Valid in: CREATE TABLE statement
Category: Table Control
Data source: SPD Engine data set, SPD Server table

Syntax

PARTSIZE= n

Arguments

n

specifies the size of the partition.

Requirements The size can be specified in megabytes, gigabytes, and terabytes. If n is specified without M, G, or T, the default is megabytes. For example, PARTSIZE=128 is the same as PARTSIZE=128M. For SPD Engine data sets, the default value is 128 megabytes. The maximum value is 8,796,093,022,207 megabytes. For SPD Server tables, the default value is the setting of the MINPARTSIZE= server option. If the MINPARTSIZE= server option is not set, the default is 16 megabytes.
When creating an SPD Engine data set, if the row length is greater than 65K, you might find that the PARTSIZE= that you specify and the actual partition size do not match. To get these numbers to match, specify a PARTSIZE= that is a multiple of 32 and the row length.

Details

The option PARTSIZE= forces the software to partition data files at the specified size. The actual size of the partition is computed to accommodate the maximum number of rows that fit in the specified size of n megabytes.
Multiple partitions are necessary to read data in parallel. The value is specified when a table is created. This size is a fixed size. This specification applies only to the data component files. Splitting (partitioning) the data portion of a table into fixed-sized files can introduce a high degree of scalability for some operations. The software can spawn threads in parallel (for example, up to one thread per partition for WHERE evaluations). Separate data partitions also enable the engine to process the data without the overhead of file access contention between the threads. Because each partition is one file, the trade-off for a small partition size is that an increased number of files (for example, UNIX i-nodes) are required to store the observations.
The partition size determines a unit of work for many of the parallel operations that require full data set scans. But, having more partitions does not always mean faster processing. The trade-offs involve balancing the increased number of physical files (partitions) that are required to store the data set against the amount of work that can be done in parallel by having more partitions. More partitions means more open files to process the data set, but a smaller number of rows in each partition.
A general rule is to have 10 or fewer partitions per data path, and 3 to 4 partitions per CPU. (Some operating systems have a limit on the number of open files that you can use.)
To determine an adequate partition size for a new table, you should be aware of the following:
  • the types of applications that run against the data
  • how much data you have
  • how many CPUs are available to the applications
  • which disks are available for storing the partitions
  • the relationships of these disks to the CPUs
For example, if each CPU controls only one disk, then an appropriate partition size would be one in which each disk contains approximately the same amount of data. If each CPU controls two disks, then an appropriate partition size would be one in which the load is balanced. Each CPU does approximately the same amount of work.
Ultimately, scalability limits using PARTSIZE= depend on how the DATAPATH= is structured. For information about the DATAPATH= for SPD Engine data sets, see SAS Scalable Performance Data Engine: Reference. For information about the DATAPATH= for SPD Server tables, see SAS Scalable Performance Data Server: Administrator’s Guide. Specifically, the limits depend on how you configure and spread the DATAPATH= file systems across striped volumes. You should spread each individual volume's striping configuration across multiple disk controllers or SCSI channels in the disk storage array. The goal for the configuration is, at the hardware level, to maximize parallelism during data retrieval.
The PARTSIZE= specification is limited by the MINPARTSIZE= system option. MINPARTSIZE= ensures that an over-zealous SAS user does not create arbitrarily small partitions, thereby generating a large number of files. The default MINPARTSIZE= value for both SPD Engine data sets and SPD Server tables is 16 megabytes and probably should not be lowered much beyond this value. Many SPD Server sites specify a higher MINPARTSIZE= value in their server parameter file. A recommended MINPARTSIZE= setting for SPD Server tables when large tables are used is 1 gigabyte.
Note: The PARTSIZE= value for a data set cannot be changed after a data set is created. To change the PARTSIZE=, you must delete the table with the DROP TABLE statement and create it again with the CREATE TABLE statement.

Example: Specifying PARTSIZE= for an SPD Engine Data Set

You have 100 gigabytes of data and 8 disks, so you can store 12.5 gigabytes per disk. Optimally, you want 3 to 4 partitions per disk. A partition size of 3.125 gigabytes is appropriate. So, you can specify PARTSIZE=3200M.
create table salecent.sw {options partsize=3200m};
Last updated: February 23, 2017