Selecting Observations

The WHERE clause conditionally selects observations, within the range specification, according to conditions given in the expression. The general form of the WHERE clause is as follows:

WHERE variable comparison-op operand ;

where

variable

is a variable in the SAS data set.

comparison-op

is one of the following comparison operators:

<

less than

<=

less than or equal to

=

equal to

>

greater than

>=

greater than or equal to

=

not equal to

?

contains a given string

?

does not contain a given string

=:

begins with a given string

=*

sounds like or is spelled like a given string

operand

is a literal value, a matrix name, or an expression in parentheses.

WHERE comparison arguments can be matrices. For the following operators, the WHERE clause succeeds if all the elements in the matrix satisfy the condition:

 

=

?

<

<=

>

>=

For the following operators, the WHERE clause succeeds if any of the elements in the matrix satisfy the condition:

 

=

?

=:

=*

Logical expressions can be specified within the WHERE clause by using the AND (&) and OR (|) operators. The general form is as follows:

 

clause&clause

 

(for an AND clause)

 

clause | clause

 

(for an OR clause)

where clause can be a comparison, a parenthesized clause, or a logical expression clause that is evaluated by using operator precedence.

For example, to list the names of all males in the data set CLASS, use the following statement:

   > list all var{name} where(sex='M');

                            OBS                NAME
                         ------ ----------------------
                             2                 THOMAS
                             3                 JAMES
                             5                 JOHN
                             7                 ROBERT
                            10                 JEFFREY
                            12                 HENRY
                            13                 ALFRED
                            17                 RONALD
                            18                 WILLIAM
                            19                 PHILIP

The WHERE comparison arguments can be matrices. In the following cases that use the =* operator, the comparison is made to each name to find a string that sounds like or is spelled like the given string or strings:

   > n={name sex age};
   > list all var n where(name=*{"ALFRED","CAROL","JUDY"});

               OBS         NAME     SEX            AGE
             ----- ---------------- -------- ---------
               11          CAROL    F          14.0000
               13          ALFRED   M          14.0000
               14          JUDY     F          14.0000

   > list all var n where(name=*{"JON","JAN"});

                       OBS NAME     SEX            AGE
                    ------ -------- -------- ---------
                         4 JANE     F          12.0000
                         5 JOHN     M          12.0000

To list AGE, HEIGHT, and WEIGHT for all students in their teens, use the following statement:

   > list all var v where(age>12);

                        OBS       AGE    HEIGHT    WEIGHT
                     ------ --------- --------- ---------
                          8   13.0000   56.5000   84.0000
                          9   13.0000   65.3000   98.0000
                         10   13.0000   62.5000   84.0000
                         11   14.0000   62.8000  102.5000
                         12   14.0000   63.5000  102.5000
                         13   14.0000   69.0000  112.5000
                         14   14.0000   64.3000   90.0000
                         15   15.0000   62.5000  112.5000
                         16   15.0000   66.5000  112.0000
                         17   15.0000   67.0000  133.0000
                         18   15.0000   66.5000  112.0000
                         19   16.0000   72.0000  150.0000

Note: In the WHERE clause, the expression on the left side refers to values of the data set variables, and the expression on the right side refers to matrix values. You cannot use comparisons that involve more than one data set variable in a single comparison; for example, you cannot use either of the following expressions:

   list all where(height>weight);
   list all where(weight-height>0);

You could use the first statement if WEIGHT were a matrix name already defined rather than a variable in the SAS data set.