SUPPORT / SAMPLES & SAS NOTES
 

Support

Sample 24804: %SQUEEZE-ing Before Compressing Data, Redux

DetailsCodeDownloadsAboutRate It
The %SQUEEZE macro (see sample 268) was originally written to optimize the space required to store numeric variables in a SAS dataset. The minimum length in bytes of a numeric variable was computed in such a way as not to lose any precision of the values contained in the variable, and a new dataset was created which used the length statement to specify the lengths of numeric variables.

In response to a user's request for enhancements, the following changes have been made to the %SQUEEZE macro, which extend its usefulness to character variables, among other things.

  • Character variables are now processed as well as numeric variables. The minimum length of a character variable is computed in such a way as not to lose any characters contained in the variable. Also, a format statement is created that is associated with the variable so that the formatted length corresponds to the computed length.

  • When numeric variables and character variables are processed, the positional order of the variables in the dataset is not altered. This feature will be useful to SAS users who are accustomed to seeing the contents of a dataset in a familiar sequence.

  • A NOCOMPRESS option was added to permit the exclusion of specified variables from the %SQUEEZE-ing process, should this be desired.

The following example demonstrates the use of %SQUEEZE:

libname sample 'C:\My SAS Files\SASData' ;
proc contents data=sample.mydata ; run ;
%SQUEEZE( sample.mydata, squozen, NOCOMPRESS=key1 key2 )
proc contents data=squozen ; run ;

My thanks to Kamran Jafry, Royal Bank of Canada, who suggested the changes included in the latest version of %SQUEEZE and generously agreed to test them.

About the Author
Ross Bettinger is a SAS Analytical Consultant. He provides support for Enterprise Miner and has been involved with data mining projects for 9 years. He has been a SAS user for 17 years. His professional interests are related to data mining, statistical analysis of data, feature selection and transformation, model building, and algorithm development.

The original 2001 %SQUEEZE Tip


These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.