WHERE= Data Set Option

Specifies specific conditions to use to select rows from a data set.

Valid in: DATA and PROC steps
Category: Observation Control
Restriction: Cannot be used with the POINT= option in the SET and MODIFY statements.
Supports: All

Syntax

Syntax Description

where-expression

is an arithmetic or logical expression that consists of a sequence of operators, operands, and SAS functions. An operand is a column, a SAS function, or a constant. An operator is a symbol that requests a comparison, logical operation, or arithmetic calculation. The expression must be enclosed in parentheses.

logical-operator

can be AND, AND NOT, OR, or OR NOT.

Details

  • Use the WHERE= data set option with an input data set to select rows that meet the condition specified in the WHERE expression before SAS brings them into the DATA or PROC step for processing. Selecting rows that meet the conditions of the WHERE expression is the first operation SAS performs in each iteration of the DATA step.
    You can also select rows that are written to an output data set. In general, selecting rows at the point of input is more efficient than selecting them at the point of output. However, there are some cases when selecting rows at the point of input is not practical or not possible.
  • You can apply OBS= and FIRSTOBS= processing to WHERE processing.
  • You cannot use the WHERE= data set option with the POINT= option in the SET and MODIFY statements.
  • If you use both the WHERE= data set option and the WHERE statement in the same DATA step, SAS ignores the WHERE statement for data sets with the WHERE= data set option. However, you can use the WHERE= data set option with the WHERE command in SAS/FSP software.
Note: Using indexed SAS data sets can improve performance significantly when you are using WHERE expressions to access a subset of the rows in a SAS data set.

Comparisons

  • The WHERE statement applies to all input data sets, whereas the WHERE= data set option selects rows only from the data set for which it is specified.
  • Do not confuse the purpose of the WHERE= data set option. The DROP= and KEEP= data set options select columns for processing. The WHERE= data set option selects rows.

Examples

Example 1: Selecting Rows from an Input Data Set

This example uses the WHERE= data set option to subset the Sales data set as it is read into another data set:
data sales;
   input product $ sales store $;
   datalines;
gizmo 234  parkview
gizmo 303  central
gizmo 124  mountain
gizmo 524  lakeside
whizmo 234 mountain
whizmo 273 lakeside
whizmo 234 parkview
whizmo 233 central
spintop 23 parkview
spintop 83 central
spintop 22 mountain
spintop 44 lakeside
;
data myfiles.whizmo;
   set myfiles.sales (where=(product='whizmo'));
run;
proc print data=myfiles.whizmo;
   title 'whizmo data set';
run;

Example 2: Selecting Rows from an Output Data Set

This example uses the WHERE= data set option to subset the SALES output data set:
data myfiles.whizmo (where=(product='whizmo'));
   set myfiles.sales;
run;
proc print data=myfiles.whizmo;
   title 'whizmo data set';
run;