The SQL Procedure |
operand operator operand |
Arguments |
is one of the following:
a constant, which is a number or a quoted character string (or other special notation) that indicates a fixed value. Constants are also called literals. Constants are described in SAS Language Reference: Dictionary.
a column-name, which is described in column-name.
a CASE expression, which is described in CASE expression.
any supported SAS function. PROC SQL supports many of the functions available to the SAS DATA step. Some of the functions that aren't supported are the variable information functions, functions that work with arrays of data, and functions that operate on rows other than the current row. Other SQL databases support their own sets of functions. Functions are described in the SAS Language Reference: Dictionary.
any functions, except those with array elements, that are created with PROC FCMP.
the ANSI SQL functions COALESCE, BTRIM, LOWER, UPPER, and SUBSTRING.
a summary-function, which is described in summary-function.
a query-expression, which is described in query-expression.
the USER literal, which references the userid of the person who submitted the program. The userid that is returned is operating environment-dependent, but PROC SQL uses the same value that the &SYSJOBID macro variable has on the operating environment.
is described in Operators and the Order of Evaluation.
Note: SAS functions, including summary functions, can stand alone as SQL expressions. For example
select min(x) from table; select scan(y,4) from table;
SAS Functions |
PROC SQL supports many of the functions available to the SAS DATA step. Some of the functions that aren't supported are the variable information functions and functions that work with arrays of data. Other SQL databases support their own sets of functions. For example, the SCAN function is used in the following query:
select style, scan(street,1) format=$15. from houses;
PROC SQL also supports any user-written functions, except those functions with array elements, that are created using PROC FCMP.
See the SAS Language Reference: Dictionary for complete documentation of SAS functions. Summary functions are also SAS functions. See summary-function for more information.
USER Literal |
USER can be specified in a view definition, for example, to create a view that restricts access to the views in the user's department. Note that the USER literal value is stored in uppercase, so it is advisable to use the UPCASE function when comparing to this value:
create view myemp as select * from dept12.employees where upcase(manager)=user;
This view produces a different set of employee information for each manager who references it.
Operators and the Order of Evaluation |
The order in which operations are evaluated is the same as in the DATA step with this one exception: NOT is grouped with the logical operators AND and OR in PROC SQL; in the DATA step, NOT is grouped with the unary plus and minus signs.
Unlike missing values in some versions of SQL, missing values in SAS always appear first in the collating sequence. Therefore, in Boolean and comparison operations, the following expressions resolve to true in a predicate:
3>null -3>null 0>null
You can use parentheses to group values or to nest mathematical expressions. Parentheses make expressions easier to read and can also be used to change the order of evaluation of the operators. Evaluating expressions with parentheses begins at the deepest level of parentheses and moves outward. For example, SAS evaluates A+B*C as A+(B*C), although you can add parentheses to make it evaluate as (A+B)*C for a different result.
Higher priority operations are performed first: that is, group 0 operators are evaluated before group 5 operators. The following table shows the operators and their order of evaluation, including their priority groups.
Group | Operator | Description |
---|---|---|
0 | ( ) | forces the expression enclosed to be evaluated first |
1 | case-expression | selects result values that satisfy specified conditions |
2 | ** | raises to a power |
|
unary +, unary - | indicates a positive or negative number |
3 | * | multiplies |
|
/ | divides |
4 | + | adds |
|
- | subtracts |
5 | || | concatenates |
6 | <NOT> BETWEEN condition | See BETWEEN condition. |
|
<NOT> CONTAINS condition | see CONTAINS condition. |
|
<NOT> EXISTS condition | See EXISTS condition. |
|
<NOT> IN condition | See IN condition. |
|
IS <NOT> condition | See IS condition. |
|
<NOT> LIKE condition | See LIKE condition. |
7 | =, eq | equals |
|
¬=, ^=, < >, ne | does not equal |
|
>, gt | is greater than |
|
<, lt | is less than |
|
>=, ge | is greater than or equal to |
|
<=, le | is less than or equal to |
|
=* | sounds like (use with character operands only). See Retrieving Values with the SOUNDS-LIKE Operator. |
|
eqt | equal to truncated strings (use with character operands only). See Truncated String Comparison Operators. |
|
gtt | greater than truncated strings |
|
ltt | less than truncated strings |
|
get | greater than or equal to truncated strings |
|
let | less than or equal to truncated strings |
|
net | not equal to truncated strings |
8 | ¬, ^, NOT | indicates logical NOT |
9 | &, AND | indicates logical AND |
10 | |, OR | indicates logical OR |
Symbols for operators might vary, depending on your operating environment. See SAS Language Reference: Dictionary for more information on operators and expressions.
Truncated String Comparison Operators |
PROC SQL supports truncated string comparison operators. (See Group 7 in Operators and Order of Evaluation.) In a truncated string comparison, the comparison is performed after making the strings the same length by truncating the longer string to be the same length as the shorter string. For example, the expression 'TWOSTORY' eqt 'TWO' is true because the string 'TWOSTORY' is reduced to 'TWO' before the comparison is performed. Note that the truncation is performed internally; neither operand is permanently changed.
Note: Unlike the DATA step, PROC SQL does not support the colon operators (such as =:, >:, and <=:) for truncated string comparisons. Use the alphabetic operators (such as EQT, GTT, and LET).
Query Expressions (Subqueries) |
A query-expression is called a subquery when it is used in a WHERE or HAVING clause. A subquery is a query-expression that is nested as part of another query-expression. A subquery selects one or more rows from a table based on values in another table.
Depending on the clause that contains it, a subquery can return a single value or multiple values. If more than one subquery is used in a query-expression, then the innermost query is evaluated first, then the next innermost query, and so on, moving outward.
PROC SQL allows a subquery (contained in parentheses) at any point in an expression where a simple column value or constant can be used. In this case, a subquery must return a single value, that is, one row with only one column.
The following is an example of a subquery that returns one value. This PROC SQL step subsets the PROCLIB.PAYROLL table based on information in the PROCLIB.STAFF table. (PROCLIB.PAYROLL is shown in Creating a Table from a Query's Result, and PROCLIB.STAFF is shown in Joining Two Tables.) PROCLIB.PAYROLL contains employee identification numbers (IdNumber) and their salaries (Salary) but does not contain their names. If you want to return only the row from PROCLIB.PAYROLL for one employee, then you can use a subquery that queries the PROCLIB.STAFF table, which contains the employees' identification numbers and their names (Lname and Fname).
options ls=64 nodate nonumber; proc sql; title 'Information for Earl Bowden'; select * from proclib.payroll where idnumber= (select idnum from proclib.staff where upcase(lname)='BOWDEN');
Information for Earl Bowden Id Number Gender Jobcode Salary Birth Hired --------------------------------------------------- 1403 M ME1 28072 28JAN69 21DEC91
Subqueries can return multiple values. The following example uses the tables PROCLIB.DELAY and PROCLIB.MARCH. These tables contain information about the same flights and have the Flight column in common. The following subquery returns all the values for Flight in PROCLIB.DELAY for international flights. The values from the subquery complete the WHERE clause in the outer query. Thus, when the outer query is executed, only the international flights from PROCLIB.MARCH are in the output.
options ls=64 nodate nonumber; proc sql outobs=5; title 'International Flights from'; title2 'PROCLIB.MARCH'; select Flight, Date, Dest, Boarded from proclib.march where flight in (select flight from proclib.delay where destype='International');
Query Output - Multiple Values
International Flights from PROCLIB.MARCH Flight Date Dest Boarded ------------------------------- 219 01MAR94 LON 198 622 01MAR94 FRA 207 132 01MAR94 YYZ 115 271 01MAR94 PAR 138 219 02MAR94 LON 147
Sometimes it is helpful to compare a value with a set of values returned by a subquery. The keywords ANY or ALL can be specified before a subquery when the subquery is the right-hand operand of a comparison. If ALL is specified, then the comparison is true only if it is true for all values that are returned by the subquery. If a subquery returns no rows, then the result of an ALL comparison is true for each row of the outer query.
If ANY is specified, then the comparison is true if it is true for any one of the values that are returned by the subquery. If a subquery returns no rows, then the result of an ANY comparison is false for each row of the outer query.
The following example selects all of the employees in PROCLIB.PAYROLL who earn more than the highest paid ME3:
options ls=64 nodate nonumber ; proc sql; title "Employees who Earn More than"; title2 "All ME's"; select * from proclib.payroll where salary > all (select salary from proclib.payroll where jobcode='ME3');
Query Output Using ALL Comparison
Employees who Earn More than All ME's Id Number Gender Jobcode Salary Birth Hired --------------------------------------------------- 1333 M PT2 88606 30MAR61 10FEB81 1739 M PT1 66517 25DEC64 27JAN91 1428 F PT1 68767 04APR60 16NOV91 1404 M PT2 91376 24FEB53 01JAN80 1935 F NA2 51081 28MAR54 16OCT81 1905 M PT1 65111 16APR72 29MAY92 1407 M PT1 68096 23MAR69 18MAR90 1410 M PT2 84685 03MAY67 07NOV86 1439 F PT1 70736 06MAR64 10SEP90 1545 M PT1 66130 12AUG59 29MAY90 1106 M PT2 89632 06NOV57 16AUG84 1442 F PT2 84536 05SEP66 12APR88 1417 M NA2 52270 27JUN64 07MAR89 1478 M PT2 84203 09AUG59 24OCT90 1556 M PT1 71349 22JUN64 11DEC91 1352 M NA2 53798 02DEC60 16OCT86 1890 M PT2 91908 20JUL51 25NOV79 1107 M PT2 89977 09JUN54 10FEB79 1830 F PT2 84471 27MAY57 29JAN83 1928 M PT2 89858 16SEP54 13JUL90 1076 M PT1 66558 14OCT55 03OCT91
Note: See the first item in Subqueries and Efficiency for a note about efficiency when using ALL.
In order to visually separate a subquery from the rest of the query, you can enclose the subquery in any number of pairs of parentheses.
Correlated Subqueries |
In a correlated subquery, the WHERE expression in a subquery refers to values in a table in the outer query. The correlated subquery is evaluated for each row in the outer query. With correlated subqueries, PROC SQL executes the subquery and the outer query together.
The following example uses the PROCLIB.DELAY and PROCLIB.MARCH tables. A DATA step (PROCLIB.DELAY) creates PROCLIB.DELAY. PROCLIB.MARCH is shown in Producing All the Possible Combinations of the Values in a Column. PROCLIB.DELAY has the Flight, Date, Orig, and Dest columns in common with PROCLIB.MARCH:
proc sql outobs=5; title 'International Flights'; select * from proclib.march where 'International' in (select destype from proclib.delay where march.Flight=delay.Flight);
The subquery resolves by substituting every value for MARCH.Flight into the subquery's WHERE clause, one row at a time. For example, when MARCH.Flight=219 , the subquery resolves as follows:
PROC SQL retrieves all the rows from DELAY where Flight=219 and passes their DESTYPE values to the WHERE clause.
PROC SQL uses the DESTYPE values to complete the WHERE clause:
where 'International' in ('International','International', ...)
The WHERE clause checks to determine whether International is in the list. Because it is, all rows from MARCH that have a value of 219 for Flight become part of the output.
The following output contains the rows from MARCH for international flights only.
International Flights Flight Date Depart Orig Dest Miles Boarded Capacity ----------------------------------------------------------------- 219 01MAR94 9:31 LGA LON 3442 198 250 622 01MAR94 12:19 LGA FRA 3857 207 250 132 01MAR94 15:35 LGA YYZ 366 115 178 271 01MAR94 13:17 LGA PAR 3635 138 250 219 02MAR94 9:31 LGA LON 3442 147 250
Subqueries and Efficiency |
Use the MAX function in a subquery instead of the ALL keyword before the subquery. For example, the following queries produce the same result, but the second query is more efficient:
proc sql; select * from proclib.payroll where salary> all(select salary from proclib.payroll where jobcode='ME3'); proc sql; select * from proclib.payroll where salary> (select max(salary) from proclib.payroll where jobcode='ME3');
With subqueries, use IN instead of EXISTS when possible. For example, the following queries produce the same result, but the second query is usually more efficient:
proc sql; select * from proclib.payroll p where exists (select * from staff s where p.idnum=s.idnum and state='CT'); proc sql; select * from proclib.payroll where idnum in (select idnum from staff where state='CT');
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.