Working with SAS Data Sets

Selecting Observations

The WHERE clause conditionally selects observations, within the range specification, according to conditions given in the expression. The general form of the WHERE clause is as follows:

WHERE variable comparison-op operand ;

where
variable
is a variable in the SAS data set.

comparison-op
is one of the following comparison operators:


<
less than

<=
less than or equal to

=
equal to

>
greater than

>=
greater than or equal to

^=
not equal to

?
contains a given string

^?
does not contain a given string

=:
begins with a given string

=*
sounds like or is spelled like a given string
operand
is a literal value, a matrix name, or an expression in parentheses.

WHERE comparison arguments can be matrices. For the following operators, the WHERE clause succeeds if all the elements in the matrix satisfy the condition:

 ^=^?<<=>>=

For the following operators, the WHERE clause succeeds if any of the elements in the matrix satisfy the condition:

 =?=:=*

Logical expressions can be specified within the WHERE clause by using the AND (&) and OR (|) operators. The general form is as follows:

 clause&clause (for an AND clause)
 clause | clause (for an OR clause)

where clause can be a comparison, a parenthesized clause, or a logical expression clause that is evaluated by using operator precedence.

For example, to list the names of all males in the data set CLASS, use the following statement:

  
    > list all var{name} where(sex='M'); 
  
  
  
  
  
                             OBS                NAME 
                          ------ ---------------------- 
                              2                 THOMAS 
                              3                 JAMES 
                              5                 JOHN 
                              7                 ROBERT 
                             10                 JEFFREY 
                             12                 HENRY 
                             13                 ALFRED 
                             17                 RONALD 
                             18                 WILLIAM 
                             19                 PHILIP
 
The WHERE comparison arguments can be matrices. In the following cases that use the =* operator, the comparison is made to each name to find a string that sounds like or is spelled like the given string or strings:
  
    > n={name sex age}; 
    > list all var n where(name=*{"ALFRED","CAROL","JUDY"}); 
  
                OBS         NAME     SEX            AGE 
              ----- ---------------- -------- --------- 
                11          CAROL    F          14.0000 
                13          ALFRED   M          14.0000 
                14          JUDY     F          14.0000 
  
    > list all var n where(name=*{"JON","JAN"}); 
  
                        OBS NAME     SEX            AGE 
                     ------ -------- -------- --------- 
                          4 JANE     F          12.0000 
                          5 JOHN     M          12.0000
 

To list AGE, HEIGHT, and WEIGHT for all students in their teens, use the following statement:

  
    > list all var v where(age>12); 
  
                         OBS       AGE    HEIGHT    WEIGHT 
                      ------ --------- --------- --------- 
                           8   13.0000   56.5000   84.0000 
                           9   13.0000   65.3000   98.0000 
                          10   13.0000   62.5000   84.0000 
                          11   14.0000   62.8000  102.5000 
                          12   14.0000   63.5000  102.5000 
                          13   14.0000   69.0000  112.5000 
                          14   14.0000   64.3000   90.0000 
                          15   15.0000   62.5000  112.5000 
                          16   15.0000   66.5000  112.0000 
                          17   15.0000   67.0000  133.0000 
                          18   15.0000   66.5000  112.0000 
                          19   16.0000   72.0000  150.0000
 
Note: In the WHERE clause, the expression on the left side refers to values of the data set variables, and the expression on the right side refers to matrix values. You cannot use comparisons involving more than one data set variable in a single comparison; for example, you cannot use either of the following expressions:
  
    list all where(height>weight); 
    list all where(weight-height>0);
 
You could use the first statement if WEIGHT were a matrix name already defined rather than a variable in the SAS data set.

Previous Page | Next Page | Top of Page