Previous Page | Next Page

SAS Data Files

Compressing Data Files


Definition of Compression

Compressing a file is a process that reduces the number of bytes required to represent each observation. In a compressed file, each observation is a variable-length record, while in an uncompressed file, each observation is a fixed-length record.

Advantages of compressing a file include

However, disadvantages of compressing a file are that

Requesting Compression

By default, a SAS data file is not compressed. To compress, you can use these options:

To compress a data file, you can specify the following:

When you create a compressed data file, SAS writes a note to the log indicating the percentage of reduction that is obtained by compressing the file. SAS obtains the compression percentage by comparing the size of the compressed file with the size of an uncompressed file of the same page size and record count.

After a file is compressed, the setting is a permanent attribute of the file, which means that to change the setting, you must re-create the file. That is, to uncompress a file, specify COMPRESS=NO for a DATA step that copies the compressed data file.

For more information on the COMPRESS= data set option, the COMPRESS= option in the LIBNAME statement, and the COMPRESS= system option, see SAS Language Reference: Dictionary.


Disabling a Compression Request

Compressing a file adds a fixed-length block of data to each observation. Because of the additional block of data (12 bytes for a 32-bit host and 24 bytes for a 64-bit host per observation), some files could result in a larger file size. For example, files with extremely short record lengths could result in a larger file size if compressed.

When a request is made to compress a data set, SAS attempts to determine whether compression will increase the size of the file. SAS examines the lengths of the variables. If, due to the number and lengths of the variables, it is not possible for the compressed file to be at least 12 bytes (for a 32-bit host) or 24 bytes (for a 64-bit host) per observation smaller than an uncompressed version, compression is disabled and a message is written to the SAS log.

For example, here is a simple data set for which SAS determines that it is not possible for the compressed file to be smaller than an uncompressed one:

data one (compress=char);
   length x y $2;
   input x y;
   datalines;
ab cd
;

The following output is written to the SAS log:

SAS Log Output When Compression Request Is Disabled

NOTE: Compression was disabled for data set WORK.ONE because compression overhead
         would increase the size of the data set.
NOTE: The data set WORK.ONE has 1 observations and 2 variables.

Previous Page | Next Page | Top of Page