Working with SAS Data Sets


Process Data by Using the WHERE Clause

Several SAS/IML statements support a WHERE clause that selects observations that satisfy specified criteria. The WHERE clause is supported by the following statements:

The WHERE clause conditionally selects observations that satisfy some criterion. The general form of the WHERE clause is

  • WHERE variable comparison-op operand;

The arguments to the WHERE clause are as follows:

variable

is a variable in the SAS data set.

comparison-op

is one of the following comparison operators:

<

less than

<=

less than or equal to

=

equal to

>

greater than

>=

greater than or equal to

^=

not equal to

?

contains a given string

^?

does not contain a given string

= :

begins with a given string

= *

sounds like or is spelled like a given string

operand

is a literal value, a matrix name, or an expression in parentheses.

For example, a typical use of the WHERE clause is to subset data:

proc iml;
use Sashelp.Class where(age>14);
read all var {Age Weight} into X;
close Sashelp.Class;
print X[colname={"Age" "Weight"}];

Figure 7.24: Observations That Satisfy a Criterion

X
Age Weight
15 112.5
15 112
16 150
15 133
15 112



You can also use a WHERE clause in the READ statement. For example, to conduct BY-group processing of all the students in the Sashelp.Class data set, first call the FREQ procedure to find the unique BY groups, and then use a WHERE clause in a DO loop to read observations from each BY group, as shown in the following example:

/* find unique BY combinations of Age and Sex */
proc freq data=Sashelp.Class;
tables Age*Sex / out=freqout
              nocum norow nocol nopercent;
run;

proc iml;
/* read unique BY groups */
use freqout;
read all var {Age Sex};
close freqout;

use Sashelp.Class;            /* open data set for reading */
MeanHeight = j(nrow(Sex), 1); /* allocate vector for results */
do i = 1 to nrow(Age);        /* for each BY group */
   /* read data for the i_th group */
   read all var {Height} where(Sex=(sex[i]) & Age=(age[i]));
   MeanHeight[i] = mean(Height); /* analyze this BY group */
end;
close Sashelp.Class;

print Age Sex MeanHeight[format=4.1];

Figure 7.25: BY-Group Processing

The FREQ Procedure

Frequency
Table of Age by Sex
Age Sex
F M Total
11
1
1
2
12
2
3
5
13
2
1
3
14
2
2
4
15
2
2
4
16
0
1
1
Total
9
10
19

Age Sex MeanHeight
11 F 51.3
11 M 57.5
12 F 58.1
12 M 60.4
13 F 60.9
13 M 62.5
14 F 63.6
14 M 66.3
15 F 64.5
15 M 66.8
16 M 72.0



The operand argument in a WHERE comparison can be a matrix. For the following operators, the WHERE clause succeeds if any of the elements in the matrix satisfy the condition:

 

=,   ?,   = :,   = *

For the following operators, the WHERE clause succeeds only if all the elements in the matrix satisfy the condition:

 

^=,   ^?,   <,   <=,   >,   >=

You can specify logical expressions within the WHERE clause by using the AND (&) and OR (|) operators. If clause is a valid WHERE expression, then you can combine expressions as follows:

 

Both conditions satisfied

 

clause1 & clause2

 

Either condition satisfied

 

clause1 | clause2

In the WHERE clause, the expression on the left side of a comparison operator refers to values of the data set variables, whereas the expression on the right side is a constant or SAS/IML matrix. Expressions that involve more than one data set variable in a single clause are not supported. For example, you cannot use either of the following expressions:

list all where(weight>height);  /* not supported */
list all where(weight-height>0);/* not supported */