COMPRESS= Data Set Option

Specifies to compress SPD Engine data sets on disk as they are being created.

Valid in: DATA step and PROC step
Default: NO
Restriction: Cannot be used with ENCRYPT=YES or ENCRYPT=AES
Interactions: IOBLOCKSIZE= Data Set Option
PADCOMPRESS= Data Set Option
Engine: SPD Engine only

Syntax

COMPRESS=NO | YES | CHAR | BINARY

Required Arguments

NO

performs no data set compression.

YES | CHAR

specifies that data in an SPD Engine data set be compressed in blocks by using RLE (run-length encoding). RLE compresses data by reducing repeated runs of the same character (including a blank space) to two-byte or three-byte representations.

BINARY

specifies that data in an SPD Engine data set be compressed in blocks by using RDC (Ross Data Compression). RDC combines RLE and sliding window compression to compress the file by representing repeated byte patterns more efficiently.

Note: This method is highly effective for compressing medium to large (several hundred bytes or larger) blocks of binary data (character and numeric variables).

Details

When you specify COMPRESS=YES | BINARY | CHAR, the SPD Engine compresses, by blocks, the data component file as it is created. To specify the size of the compressed blocks, use the IOBLOCKSIZE= Data Set Option when you create the data set. To add padding to the newly compressed blocks, specify PADCOMPRESS= Data Set Option when creating or updating the data set. For more information, see Compressing SPD Engine Data Sets.
The SPD Engine does not support user-specified compression. If you are migrating a default Base SAS engine data set that is both compressed and encrypted, the encryption is retained, but the compression is dropped.
The CONTENTS procedure identifies the compress setting. If the data set is compressed, PROC CONTENTS prints information about the compression. The following example explains the Compressed Info fields in the CONTENTS procedure output:
In general, COMPRESS=CHAR provides good compression when single bytes repeat; COMPRESS=BINARY provides good compression when strings of bytes repeat. At the same time, it is more costly to look for strings of bytes that repeat, than to look for single bytes that repeat. For examples, see COMPRESS=CHAR and COMPRESS=BINARY.
PROC CONTENTS Compressed Section
CONTENTS Procedure Compressed Section
Number of compressed blocks
number of compressed blocks that are required to store data.
Raw data blocksize
compressed block size in bytes calculated from the size specified in the IOBLOCKSIZE= data set option. It is the largest multiple of the observation length that gets in the block size.
Number of blocks with overflow
number of compressed blocks that needed more space. When data is updated and the compressed new block is larger than the compressed old block, an overflow block fragment is created.
Max overflow chain length
largest number of overflows for a single block. For example, the maximum overflow chain length would be 2 if a compressed block was updated and became larger, and then updated again to a larger size.
Block number for max chain
number of the block containing the largest number of overflow blocks.
Min overflow area
minimum amount of disk space that an overflow requires.
Max overflow area
maximum amount of disk space that an overflow requires.
Accessing compressed files usually requires more processing time. The files have to be decompressed before reading them and, if updating, they have to be compressed again when written to disk.

Comparisons

The COMPRESS= data set option overrides the COMPRESS= LIBNAME statement option and the COMPRESS= system option.

Examples

Example 1: COMPRESS=CHAR

data mylib.CharRepeats(compress=char);
   length ca $ 200;
   do i=1 to 100000;
      ca='aaaaaaaaaaaaaaaaaaaaaa';
      cb='bbbbbbbbbbbbbbbbbbbbbb';
      cc='cccccccccccccccccccccc';
      output;
   end;
run;

The following message is written to the SAS log:
NOTE: Compressing data set MYLIB.CHARREPEATS decreased size by 88.55 percent.
      Compressed is 45 pages; un-compressed would require 393 pages.

Example 2: COMPRESS=BINARY

data mylib.StringRepeats(compress=binary);
   length cabcd $ 200;
   do i=1 to 1000000;
      cabcd='abcdabcdabcdabcdabcdabcdabcdabcd';
      cefgh='efghefghefghefghefghefghefghefgh';
      cijkl='ijklijklijklijklijklijklijklijkl';
      output;
   end;
run;
The following message is written to the SAS log:
NOTE: Compressing data set MYLIB.STRINGREPEATS decreased size by 70.27 percent.
      Compressed is 1239 pages; un-compressed would require 4167 pages.