Options That Affect Disk Space

ASYNCINDEX=

COMPRESS=

PARTSIZE=

ASYNCINDEX=

Specifies when creating multiple indexes on an SPD Server table whether to create the indexes in parallel.

Syntax

ASYNCINDEX=YES|NO

Default: NO

Corresponding Macro Variable

SPDSIASY

Arguments

YES

creates the indexes in parallel.

creates a single index at a time.

Description

SPD Server can create multiple indexes for a table at the same time. To do this, it launches a single thread for each index created, and then processes the threads simultaneously. Although creating indexes in parallel is much faster, the default for this option is NO. The reason is because parallel creation requires additional sort work space that might not be available.

For a complete description of the benefits and tradeoffs of creating multiple indexes in parallel, see SPDSIASY=.

Example

Because the disk workspace required for parallel index creation is available, specify for SPD Server to create, in parallel, the X, Y, and COMP indexes for table A.

PROC DATASETS lib=mydatalib;
   modify a(asyncindex=yes);
   index create x;
   index create y;
   index create comp=(x y);
   quit;

COMPRESS=

Compresses SPD Server tables on disk.

Syntax

COMPRESS=YES|CHAR|BINARY

Default: NO

Use in Conjunction with Table Option

IOBLOCKSIZE=

Corresponding Macro Variable

SPDSDCMP=

Arguments

YES | CHAR

specifies that the observations in a newly created data set are compressed by SAS using run-length encoding (RLE). RLE compresses observations by reducing repeated consecutive characters (including blanks) to 2-byte or 3-byte representations. Use the YES or CHAR argument to enable RLE compression for character data. The two arguments are functionally identical and interchangeable.

BINARY

specifies that the observations in a newly created data set are compressed by SAS using Ross Data Compression (RDC). RDC combines run-length encoding and sliding-window compression to compress the file. Use the BINARY argument to compress binary and numeric data. This method is highly effective for compressing medium to large (several hundred bytes or larger) blocks of binary data (numeric variables).

Description

When COMPRESS= is assigned YES, SPD Server compresses newly created tables by 'blocks' based on the algorithm specified. To control the amount of compression, use the table option IOBLOCKSIZE=. This option specifies the number of rows that you want to store in the block.

When COMPRESS=BINARY is specified, both numeric data and character data are compressed.

If you specify values for both the COMPRESS= table option and the corresponding SPDSCMP= macro variable, the SPDSCMP= macro variable setting overrides the COMPRESS= table option setting.

Note: Once a compressed table is created, you cannot change its block size. To resize the block, you must PROC COPY the table to a new table, setting IOBLOCKSIZE= to the block size desired for the output table.

Example 1: COMPRESS=YES

data mylib.CharRepeats(COMPRESS=YES);
   length ca $ 200;
   do i=1 to 100000;
      ca='aaaaaaaaaaaaaaaaaaaaaa';
      cb='bbbbbbbbbbbbbbbbbbbbbb';
      cc='cccccccccccccccccccccc';
      output;
   end;
run;

The following message is written to the log:
NOTE: Compressing data set MYLIB.CHARREPEATS decreased size by 93.34 percent.

Example 2: COMPRESS=BINARY


data mylib.StringRepeats(COMPRESS=BINARY);
   length cabcd $ 200;
   do i=1 to 1000000;      
      cabcd='abcdabcdabcdabcdabcdabcdabcdabcd';
      cefgh='efghefghefghefghefghefghefghefgh';
      cijkl='ijklijklijklijklijklijklijklijkl';
      output;
   end;
run;

The following message is written to the SAS log:
NOTE: Compressing data set MYLIB.STRINGREPEATS decreased size by 97.85 percent.

PARTSIZE=

Specifies the size of an SPD Server table partition.

Syntax

PARTSIZE=n

Default: 16 MB for domains that are not Hadoop domains, 128 MB for Hadoop domains

Corresponding Macro Variable

SPDSSIZE=

Arguments

is the size of the partition in megabytes.

Description

Specifying PARTSIZE= forces the software to partition (split) SPD Server tables at the given size. The actual size is computed to accommodate the largest number of rows that will fit in the specified size of n Mbytes.

Use this option to improve performance of WHERE clause evaluation on non-indexed table columns and on SQL GROUP_BY processing. By splitting the data portion of a Scalable Platform Data Server table at fixed-sized intervals, the software can introduce a high degree of scalability for these operations. The reason: it can launch threads in parallel to perform the evaluation on different partitions of the table, without the threat of file access contention between the threads. There is, however, a price for the table splits: an increased number of files, which are required to store the rows of the table.

The PARTSIZE= specification is limited by MINPARTSIZE=, an SPD Server parameter maintained by the SPD Server administrator. MINPARTSIZE= ensures that an over-zealous SAS user does not create arbitrarily small partitions, thereby generating a large number of files. The default for MINPARTSIZE= is 16 Mbytes and probably should not be lowered much beyond this value.

If you use PARTSIZE= to specify a partition size for a domain that is not a Hadoop domain, then the value of PARTSIZE= must be greater than the value declared for MINPARTSIZE in the SPD Server parameter file in order to have any effect.

If you use PARTSIZE= to specify a partition size for a Hadoop domain, the value of PARTSIZE= must be larger than the greater of the value declared for MINPARTSIZE or 128 MB, in order to have any effect.

Note: The PARTSIZE value for a table cannot be changed after a table is created. To change the PARTSIZE, you must PROC COPY the table and use a different PARTSIZE option setting on the new (output) table.

Example

Using PROC SQL, extract a set of rows from an existing table to create a non-indexed table with a partition size of 32 Mbytes in a SAS job:

PROC SQL;
create table SPDSCEN.HR80SPDS(partsize=32)
  as select
    state,
    age,
    sex,
    hour89,
    industry,
    occup
   from SPDSCEN.PRECS
   where hour89 > 40;
quit;