Compressing Data Files

Definition of Compression

Compressing a file is a process that reduces the number of bytes required to represent each observation. In a compressed file, each observation is a variable-length record, while in an uncompressed file, each observation is a fixed-length record.
Advantages of compressing a file include the following:
  • reduced storage requirements for the file
  • less I/O operations necessary to read from or write to the data during processing
There are disadvantages to compressing a file. For example:t
  • More CPU resources are required to read a compressed file because of the overhead of uncompressing each observation.
  • There are situations when the resulting file size can increase rather than decrease.

Requesting Compression

By default, a SAS data file is not compressed. To compress, you can use these options:
  • COMPRESS= system option to compress all data files that are created during a SAS session
  • COMPRESS= option in the LIBNAME statement to compress all data files for a particular SAS library
  • COMPRESS= data set option to compress an individual data file
To compress a data file, you can specify the following:
  • COMPRESS=CHAR to use the RLE (Run Length Encoding) compression algorithm
  • COMPRESS=BINARY to use the RDC (Ross Data Compression) algorithm
When you create a compressed data file, SAS writes a note to the log indicating the percentage of reduction that is obtained by compressing the file. SAS obtains the compression percentage by comparing the size of the compressed file with the size of an uncompressed file of the same page size and record count.
After a file is compressed, the setting is a permanent attribute of the file, which means that to change the setting, you must re-create the file. That is, to uncompress a file, specify COMPRESS=NO for a DATA step that copies the compressed data file.
For more information about the COMPRESS= data set option, see SAS Data Set Options: Reference. For more information about the COMPRESS= option in the LIBNAME statement, see SAS Statements: Reference. For more information about the COMPRESS= system option, see SAS System Options: Reference.

Disabling a Compression Request

Compressing a file adds a fixed-length block of data to each observation. Because of the additional block of data (12 bytes for a 32-bit host and 24 bytes for a 64-bit host per observation), some files could result in a larger file size. For example, files with extremely short record lengths could result in a larger file size if compressed.
When a request is made to compress a data set, SAS attempts to determine whether compression will increase the size of the file. SAS examines the lengths of the variables. If, due to the number and lengths of the variables, it is not possible for the compressed file to be at least 12 bytes (for a 32-bit host) or 24 bytes (for a 64-bit host) per observation smaller than an uncompressed version, compression is disabled and a message is written to the SAS log.
For example, here is a simple data set for which SAS determines that it is not possible for the compressed file to be smaller than an uncompressed one:
data one (compress=char);
   length x y $2;
   input x y;
   datalines;
ab cd
;
The following output is written to the SAS log:
SAS Log Output When Compression Request Is Disabled
NOTE: Compression was disabled for data set WORK.ONE because compression 
      overhead would increase the size of the data set.
NOTE: The data set WORK.ONE has 1 observations and 2 variables.