Syntax of WHERE Expression

WHERE Expression Contents

A WHERE expression is a type of SAS expression that defines a condition for selecting observations. A WHERE expression can be as simple as a single variable name or a constant (which is a fixed value). A WHERE expression can be a SAS function, or it can be a sequence of operands and operators that define a condition for selecting observations. In general, the syntax of a WHERE expression is as follows:
WHERE operand <operator> <operand>
operand
something to be operated on. An operand can be a variable, a SAS function, or a constant. See Specifying an Operand.
operator
a symbol that requests a comparison, logical operation, or arithmetic calculation. All SAS expression operators are valid for a WHERE expression, which include arithmetic, comparison, logical, minimum and maximum, concatenation, parentheses to control order of evaluation, and prefix operators. In addition, you can use special WHERE expression operators. These expression operators include BETWEEN-AND, CONTAINS, IS NULL or IS MISSING, LIKE, sounds-like, and SAME-AND. See Specifying an Operator .

Specifying an Operand

Variable

A variable is a column in a SAS data set. Each SAS variable has attributes like name and type (character or numeric). The variable type determines how you specify the value for which you are searching. For example:
where score > 50;
where date >= '01jan2001'd and time >= '9:00't;
where state = 'Texas';
In a WHERE expression, you cannot use automatic variables created by the DATA step (for example, FIRST.variable, LAST.variable, _N_, or variables created in assignment statements).
As in other SAS expressions, the names of numeric variables can stand alone. SAS treats numeric values of 0 or missing as false; other values as true. In the following example, the WHERE expression returns all rows where EMPNUM is not missing and not zero and ID is not missing and not zero:
where empnum and id;
The names of character variables can also stand alone. SAS selects observations where the value of the character variable is not blank. For example, the following WHERE expression returns all values not equal to blank:
where lastname;

SAS Function

A SAS function returns a value from a computation or system manipulation. Most functions use arguments that you supply, but a few obtain their arguments from the operating environment. To use a SAS function in a WHERE expression, type its name and arguments enclosed in parentheses. Some functions that you might want to specify include:
  • SUBSTR extracts a substring.
  • TODAY returns the current date.
  • PUT returns a given value using a given format.
The following DATA step produces a SAS data set that contains only observations from data set CUSTOMER in which the value of NAME begins with Mac and the value of variable CITY is Charleston or Atlanta:
data testmacs;
   set customer;
   where substr (name,1,3) = 'Mac' and 
   (city='Charleston' or city='Atlanta');
run;
The OF syntax is permitted in some SAS functions, but it cannot be used when using those functions that are specified in a WHERE clause. In the following DATA step example, OF can be used with RANGE.
data newfile;
x1=2;
x2=3;
x3=4;
r=range(of x1-x3);
run;
When you use the WHERE clause with RANGE and OF, an error is written to the SAS log.
Output When WHERE Clause Is Used with OF
proc print data=abc;
where range(of x1-x3)=6;
                    --
                    22
                    76
ERROR: Syntax error while parsing WHERE clause.
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *,
**,
+, ',', -, /, <, <=, <>, =, >, >=, ?,
              AND, BETWEEN, CONTAINS, EQ, GE, GT, LE, LIKE, LT, NE, OR, ^=, |,
||, ~=.
ERROR 76-322: Syntax error, statement will be ignored.
run;
Below is a table of SAS functions that can use the OF syntax:
SAS Functions That Use the OF Syntax
CAT
CATS
CATT
CATX
CSS
CV
GEOMEAN
GEOMEANZ
HARMEAN
HARMEANZ
KURTOSIS
MAX
MEAN
MIN
N
NMISS
ORDINAL
RANGE
RMS
SKEWNESS
STD
STDERR
SUM
USS
VAR
Note: The SAS functions that are used in a WHERE expression and can be optimized by an index are the SUBSTR function and the TRIM function.
For more information about SAS functions, see SAS Functions and CALL Routines: Reference.

Constant

A constant is a fixed value such as a number or quoted character string, that is, the value for which you are searching. A constant is a value of a variable obtained from the SAS data set, or values created within the WHERE expression itself. Constants are also called literals. For example, a constant could be a flight number or the name of a city. A constant can also be a time, date, or datetime value.
The value is either numeric or character. Note the following rules regarding whether to use quotation marks:
  • If the value is numeric, do not use quotation marks.
    where price > 200;
  • If the value is character, use quotation marks.
    where lastname eq 'Martin';
  • You can use either single or double quotation marks, but do not mix them. Quoted values must be exact matches, including case.
  • It might be necessary to use single quotation marks when double quotation marks appear in the value, or use double quotation marks when single quotation marks appear in the value.
    where item = '6" decorative pot';
    where name ? "D'Amico";
  • A SAS date constant must be enclosed in quotation marks. When you specify date values, case is not important. You can use single or double quotation marks. The following expressions are equivalent:
    where birthday = '24sep1975'd;
    where birthday = '24sep1975"d;

Specifying an Operator

Arithmetic Operators

Arithmetic operators enable you to perform a mathematical operation. The arithmetic operators include the following:
Arithmetic Operators
Symbol
Definition
Example
*
multiplication
where bonus = salary * .10;
/
division
where f = g/h;
+
addition
where c = a+b;
-
subtraction
where f = g-h;
**
exponentiation
where y = a**2;

Comparison Operators

Comparison operators (also called binary operators) compare a variable with a value or with another variable. Comparison operators propose a relationship and ask SAS to determine whether that relationship holds. For example, the following WHERE expression accesses only those observations that have the value 78753 for the numeric variable ZIPCODE:
where zipcode eq 78753;
The following table lists the comparison operators:
Comparison Operators
Symbol
Mnemonic Equivalent
Definition
Example
=
EQ
equal to
where empnum eq 3374;
^= or ~= or ¬= or <>
NE
not equal to
where status ne full-time;
>
GT
greater than
where hiredate gt '01jun1982'd;
<
LT
less than
where empnum < 2000;
>=
GE
greater than or equal to
where empnum >= 3374;
<=
LE
less than or equal to
where empnum <= 3374;
IN
equal to one from a list of values
where state in ('NC','TX');
When you do character comparisons, you can use the colon (:) modifier to compare only a specified prefix of a character string. For example, in the following WHERE expression, the colon modifier, used after the equal sign, tells SAS to look at only the first character in the values for variable LASTNAME and to select the observations with names beginning with the letter S:
where lastname=: 'S';
Note that in the SQL procedure, the colon modifier that is used in conjunction with an operator is not supported; you can use the LIKE operator instead.

IN Operator

The IN operator, which is a comparison operator, searches for character and numeric values that are equal to one from a list of values. The list of values must be in parentheses, with each character value in quotation marks and separated by either a comma or blank.
For example, suppose you want all sites that are in North Carolina or Texas. You could specify:
where state = 'NC' or state = 'X';
However, it is easier to use the IN operator, which selects any state in the list:
where state in ('NC','TX');
In addition, you can use the NOT logical operator to exclude a list.
where state not in ('CA', 'TN', 'MA');
You can use a shorthand notation to specify a range of sequential integers to search. The range is specified by using the syntax M:N as a value in the list to search, where M is the lower bound and N is the upper bound. M and N must be integers, and M, N, and all the integers between M and N are included in the range. For example, the following statements are equivalent.
  • y = x in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  • y = x in (1:10);

Fully Bounded Range Condition

A fully bounded range condition consists of a variable between two comparison operators, specifying both an upper and lower limit. For example, the following expression returns the employee numbers that fall within the range of 500 to 1000 (inclusive):
where 500 <= empnum <= 1000;
Note that the previous range condition expression is equivalent to the following:
where empnum >= 500 and empnum <= 1000;
You can combine the NOT logical operator with a fully bounded range condition to select observations that fall outside the range. Note that parentheses are required:
where not (500 <= empnum <= 1000);

BETWEEN-AND Operator

The BETWEEN-AND operator is also considered a fully bounded range condition that selects observations in which the value of a variable falls within an inclusive range of values.
You can specify the limits of the range as constants or expressions. Any range that you specify is an inclusive range, so that a value equal to one of the limits of the range is within the range. The general syntax for using BETWEEN-AND is as follows:
WHERE variable BETWEEN value AND value;
For example:
where empnum between 500 and 1000;
where taxes between salary*0.30 and salary*0.50;
You can combine the NOT logical operator with the BETWEEN-AND operator to select observations that fall outside the range:
where empnum not between 500 and 1000;
Note: The BETWEEN-AND operator and a fully bounded range condition produce the same results. That is, the following WHERE expressions are equivalent:
where 500 <= empnum <= 1000;
where empnum between 500 and 1000;

CONTAINS Operator

The most common usage of the CONTAINS (?) operator is to select observations by searching for a specified set of characters within the values of a character variable. The position of the string within the variable's values does not matter. However, the operator is case sensitive when making comparisons.
The following examples select observations having the values Mobay and Brisbayne for the variable COMPANY, but they do not select observations containing Bayview:
where company contains 'bay';
where company ? 'bay';
You can combine the NOT logical operator with the CONTAINS operator to select observations that are not included in a specified string:
where company not contains 'bay';
You can also use the CONTAINS operator with two variables, that is, to determine whether one variable is contained in another. When you specify two variables, keep in mind the possibility of trailing spaces, which can be resolved using the TRIM function.
proc sql;
   select *
   from table1 as a, table2 as b
   where a.fullname contains trim(b.lastname) and
      a.fullname contains trim(b.firstname);
In addition, the TRIM function is helpful when you search on a macro variable.
proc print;
   where fullname contains trim("&lname");
run;

IS NULL or IS MISSING Operator

The IS NULL or IS MISSING operator selects observations in which the value of a variable is missing. The operator selects observations with both regular or special missing value characters and can be used for both character and numeric variables.
where idnum is missing;
where name is null;
The following are equivalent for character data:
where name is null;
where name = ' ';
And the following is equivalent for numeric data. This statement differentiates missing values with special missing value characters:
where idnum <= .Z;
You can combine the NOT logical operator with IS NULL or IS MISSING to select nonmissing values, as follows:
where salary is not missing;

LIKE Operator

The LIKE operator selects observations by comparing the values of a character variable to a specified pattern, which is referred to as pattern matching. The LIKE operator is case sensitive. There are two special characters available for specifying a pattern:
percent sign (%)
specifies that any number of characters can occupy that position. The following WHERE expression selects all employees with a name that starts with the letter N. The names can be of any length.
where lastname like 'N%';
underscore (_)
matches just one character in the value for each underscore character. You can specify more than one consecutive underscore character in a pattern, and you can specify a percent sign and an underscore in the same pattern. For example, you can use different forms of the LIKE operator to select character values from this list of first names:
  • Diana
  • Diane
  • Dianna
  • Dianthus
  • Dyan
The following table shows which of these names is selected by using various forms of the LIKE operator:
Pattern
Name Selected
like 'D_an'
Dyan
like 'D_an_'
Diana, Diane
like 'D_an__'
Dianna
like 'D_an%'
all names from list
You can use a SAS character expression to specify a pattern, but you cannot use a SAS character expression that uses a SAS function.
You can combine the NOT logical operator with LIKE to select values that do not have the specified pattern, such as the following:
where frstname not like 'D_an%';
Because the % and _ characters have special meaning for the LIKE operator, you must use an escape character when searching for the % and _ characters in values. An escape character is a single character that, in a sequence of characters, signifies that what follows takes an alternative meaning. For the LIKE operator, an escape character signifies to search for literal instances of the % and _ characters in the variable's values instead of performing the special-character function.
For example, if the variable X contains the values abc, a_b, and axb, the following LIKE operator with an escape character selects only the value a_b. The escape character (/) specifies that the pattern searches for a literal ' _' that is surrounded by the characters a and b. The escape character (/) is not part of the search.
where x like 'a/_b' escape '/'; 
Without an escape character, the following LIKE operator would select the values a_b and axb. The special character underscore in the search pattern matches any single b character, including the value with the underscore:
where x like 'a_b';  
To specify an escape character, include the character in the pattern-matching expression, and then the keyword ESCAPE followed by the escape-character expression. When you include an escape character, the pattern-matching expression must be enclosed in quotation marks, and it cannot contain a column name. The escape-character expression evaluates to a single character. The operands must be character or string literals. If it is a single character, enclose it in quotation marks.
LIKE 'pattern-matching-expression' ESCAPE 'escape-character-expression'

Sounds-like Operator

The sounds-like ( =*) operator selects observations that contain a spelling variation of a specified word or words. The operator uses the Soundex algorithm to compare the variable value and the operand. For more information, see the SOUNDEX function in SAS Functions and CALL Routines: Reference.
Note: Note that the SOUNDEX algorithm is English-biased, and is less useful for languages other than English.
Although the sounds-like operator is useful, it does not always select all possible values. For example, consider that you want to select observations from the following list of names that sound like Smith:
  • Schmitt
  • Smith
  • Smithson
  • Smitt
  • Smythe
The following WHERE expression selects all the names from this list except Smithson:
where lastname=* 'Smith';
You can combine the NOT logical operator with the sounds-like operator to select values that do not contain a spelling variation of a specified word or words, such as:
where lastname not =* 'Smith';
Note: The sounds-like operator cannot be optimized with an index.

SAME-AND Operator

Use the SAME-AND operator to add more conditions to an existing WHERE expression later in the program without retyping the original conditions. This capability is useful with the following:
  • interactive SAS procedures
  • full-screen SAS procedures that enable you to type a WHERE expression on the command line
  • any type of RUN-group processing.
Use the SAME-AND operator when you already have a WHERE expression defined and you want to insert additional conditions. The SAME-AND operator has the following form:
  • where-expression-1;
  • ... SAS statements...
  • WHERE SAME AND where-expression-2;
  • ... SAS statements...
  • WHERE SAME AND where-expression-n;
SAS selects observations that satisfy the conditions after the SAME-AND operator in addition to any previously defined conditions. SAS treats all of the existing conditions as if they were conditions separated by AND operators in a single WHERE expression.
The following example shows how to use the SAME-AND operator within RUN groups in the GPLOT procedure. The SAS data set YEARS has three variables and contains quarterly data for the 2009–2011 period:
proc gplot data=years;
   plot unit*quar=year;
run; 

   where year > '01jan2009'd;
run;

   where same and year < '01jan2012'd;
run;
The following WHERE expression is equivalent to the preceding code:
where year > ''01jan2009'd and year < '01jan2012'd;

MIN and MAX Operators

Use the MIN or MAX operators to find the minimum or maximum value of two quantities. Surround the operators with the two quantities whose minimum or maximum value you want to know.
  • The MIN operator returns the lower of the two values.
  • The MAX operator returns the higher of two values.
For example, if A is less than B, then the following would return the value of A:
where x = (a min b);
Note: The symbol representation >< is not supported, and <> is interpreted as “not equal to.”

Concatenation Operator

The concatenation operator concatenates character values. You indicate the concatenation operator as follows:
  • || (two OR symbols)
  • !! (two exclamation marks)
  • ¦¦ (two broken vertical bars).
For example,
where name = 'John'||'Smith'; 

Prefix Operators

The plus sign (+) and minus sign (–) can be either prefix operators or arithmetic operators. They are prefix operators when they appear at the beginning of an expression or immediately preceding an open parentheses. A prefix operator is applied to the variable, constant, SAS function, or parenthetic expression.
where z = −(x + y);
Note: The NOT operator is also considered a prefix operator.