Previous Page | Next Page

Optimizing Data Storage

Compressing Data

Compression is a process that reduces the number of bytes that are required to represent each table row. In a compressed file, each row is a variable-length record. In an uncompressed file, each row is a fixed-length record. Compressed tables contain an internal index that maps each row number to a disk address so that the application can access data by row number. This internal index is transparent to the user. Compressed tables have the same access capabilities as uncompressed tables. Here are some advantages of compressing a file:

Here are some disadvantages of compressing a file:

These are the types of compression that you can specify:

You can compress these types of tables:

Note:   The SPD Engine compresses the data component (.dpf) file by blocks as the engine is creating the file. (The data component file stores partitions for an SPD Engine table.) To specify the number of observations that you want to store in a compressed block, you use the IOBLOCKSIZE= table option in addition to the COMPRESS= table option. For example, in the Table Options field in the table properties dialog box, you might enter COMPRESS=YES IOBLOCKSIZE=10000. The default blocksize is 4096 (4k).  [cautionend]

When you create a compressed table, SAS records in the log the percentage of reduction that is obtained by compressing the file. SAS obtains the compression percentage by comparing the size of the compressed file with the size of an uncompressed file of the same page size and record count. After a file is compressed, the setting is a permanent attribute of the file, which means that to change the setting, you must re-create the file. To uncompress a file, you can, for example, in SAS Data Integration Studio, select Default (NO) for the Compressed option in the table properties dialog box for a SAS table.

For more information about compression, see SAS Language Reference: Dictionary.

Previous Page | Next Page | Top of Page