Sample 24814: Subsetting Observations More Efficiently
If you need to subset or reduce the number of observations in a SAS data set, try using the WHERE statement in your procedure code. It can be more efficient than a subsetting IF statement. And although WHERE statements and subsetting IF statements produce identical results, the comparisons end
there.
For example, the subsetting IF statement can be used to process raw data files, as well as existing SAS data sets. A WHERE statement can be used only with variables in an existing SAS data set.
Another difference is that the WHERE statement first validates the condition to see whether the observation is to be kept before it is read into the Program Data Vector (PDV). (Note:The PDV serves as a temporary storage area for an observation.)This is an important efficiency consideration because only observations that meet the WHERE condition
are processed. In contrast, a subsetting IF statement reads in the observation and then validates the condition to see whether the observation is kept or not.
A third important distinction between the two techniques is that a WHERE statement can be placed in a procedure to avoid creating an extra SAS data set. A subsetting IF statement cannot.
Less efficient -- Two-step process using a subsetting IF statement:
DATA ELDERLY ;
SET PATIENTS ;
IF AGE > 65 ;
RUN ;
PROC PRINT DATA=ELDERLY ;
RUN ;
More efficient -- One-step process using a WHERE statement:
PROC PRINT DATA=PATIENTS ;
WHERE AGE > 65 ;
RUN ;
This tip is from Kirk Paul Lafler, Senior Consultant and founder of Software Intelligence Corp., a SAS Institute Quality Partner organization in Spring Valley, California. Lafler has 22 years of SAS programming and training experience, and is active in several SAS users groups. If you have a SAS tip you'd like to share, send it to techeditor@sas.com.
/* Less efficient -- Two-step process using a subsetting IF statement */
DATA ELDERLY ;
SET PATIENTS ;
IF AGE > 65 ;
RUN ;
PROC PRINT DATA=ELDERLY ;
RUN ;
/*
More efficient -- One-step process using a WHERE statement
*/
PROC PRINT DATA=PATIENTS ;
WHERE AGE > 65 ;
RUN ;
This sample shows how it can be efficient to use the WHERE statement in your procedure code to subset observations.
| Type: | Sample |
| Topic: | Non SAS Authors ==> Kirk Paul Lafler
|
| Date Modified: | 2007-08-27 08:27:39 |
| Date Created: | 2004-10-08 10:55:57 |
Operating System and Release Information
| SAS System | Base SAS | All | n/a | n/a |