COMPRESS= Data Set Option

Specifies how observations are compressed in a new output SAS data set.

Valid in: DATA step and PROC steps
Category: Data Set Control
Restriction: Use with output data sets only.

Syntax

COMPRESS=NO | YES | CHAR | BINARY

Syntax Description

NO

specifies that the observations in a newly created SAS data set are uncompressed (maintaining fixed-length records).

YES | CHAR

specifies that the observations in a newly created SAS data set are compressed (producing variable-length records) by using RLE (Run Length Encoding). RLE compresses observations by reducing repeated runs of the same character (including blanks) to two-byte or three-byte representations.

Alias ON

BINARY

specifies that the observations in a newly created SAS data set are compressed (producing variable-length records) by using RDC (Ross Data Compression). RDC combines run-length encoding and sliding-window compression to compress the file by representing repeated byte patterns more efficiently.

Note: This method is highly effective for compressing medium to large (several hundred bytes or larger) blocks of binary data (character and numeric variables). Because the compression function operates on a single record at a time, the record length needs to be several hundred bytes or larger for effective compression.

Details

Compressing a file reduces the number of bytes that are required to represent each observation. Advantages of compressing a file include reduced storage requirements for the file and fewer I/O operations to read or write to the data during processing. However, more CPU resources are required to read a compressed file (because of the overhead of uncompressing each observation). There are situations where the resulting file size might increase rather than decrease.
Use the COMPRESS= data set option to compress an individual file. Specify the option for output data sets only. That is, specify data sets named in the DATA statement of a DATA step or in the OUT= option of a SAS procedure. Use the COMPRESS= data set option only when you are creating a SAS data file (member type DATA). You cannot compress SAS views, because they contain no data. The COPY procedure does not support data set options. Therefore, you cannot use the COMPRESS= data set option in PROC COPY or a COPY statement from PROC DATASETS.
Tip
To compress an OUTPUT data set that is generated by PROC COPY, you can use the COMPRESS=YES system option before the PROC COPY statement with the NOCLONE option.
options compress=yes;
proc copy in=work out=new noclone;
select x;
run;

After a file is compressed, the setting is a permanent attribute of the file. To change the setting, you must re-create the file. That is, to uncompress a file, specify COMPRESS=NO for a DATA step that copies the compressed file.
In general, COMPRESS=CHAR provides good compression when single bytes repeat; COMPRESS=BINARY provides good compression when strings of bytes repeat. It is more costly to look for strings of bytes that repeat, than to look for single bytes that repeat. For examples, see Compress=CHAR and COMPRESS=BINARY.

Comparisons

The COMPRESS= data set option overrides the COMPRESS= option in the LIBNAME statement and the COMPRESS= system option.
The data set option POINTOBS=YES, which is the default, determines that a compressed data set can be processed with random access (by observation number) rather than sequential access. With random access, you can specify an observation number in the FSEDIT procedure and the POINT= option in the SET and MODIFY statements.
When you create a compressed file, you can also specify REUSE=YES (as a data set option or system option) to track and reuse space. With REUSE=YES, new observations are inserted in available space when other observations are updated or deleted. When the default REUSE=NO is in effect, new observations are appended to the existing file.
POINTOBS=YES and REUSE=YES are mutually exclusive. That is, they cannot be used together. REUSE=YES takes precedence over POINTOBS=YES. If you set REUSE=YES, SAS automatically sets POINTOBS=NO.
The TAPE engine supports the COMPRESS= data set option, but the engine does not support the COMPRESS= system option.
The XPORT engine does not support compression.

Examples

Example 1: Compress=CHAR

data mylib.CharRepeats(compress=char);
   length ca $ 200;
   do i=1 to 100000;
      ca='aaaaaaaaaaaaaaaaaaaaaa';
      cb='bbbbbbbbbbbbbbbbbbbbbb';
      cc='cccccccccccccccccccccc';
      output;
   end;
run;
The following message is written to the SAS log:
NOTE: Compressing data set MYLIB.CHARREPEATS decreased size by 88.55 percent.
      Compressed is 45 pages; un-compressed would require 393 pages.

Example 2: COMPRESS=BINARY

data mylib.StringRepeats(compress=binary);
   length cabcd $ 200;
   do i=1 to 1000000;
      cabcd='abcdabcdabcdabcdabcdabcdabcdabcd';
      cefgh='efghefghefghefghefghefghefghefgh';
      cijkl='ijklijklijklijklijklijklijklijkl';
      output;
   end;
run;
The following message is written to the SAS log:
NOTE: Compressing data set MYLIB.STRINGREPEATS decreased size by 70.27 percent.
      Compressed is 1239 pages; un-compressed would require 4167 pages.

See Also

Compressing Data Files in SAS Language Reference: Concepts
Statements:
LIBNAME Statement in SAS Statements: Reference
System Options:
COMPRESS= System Option in SAS System Options: Reference
REUSE= System Option in SAS System Options: Reference