Question from the Field
Question: When is a correlated subquery desirable or necessary?
Answer: Many queries can be evaluated by executing the subquery once and substituting the resulting value or values into the WHERE clause of the outer query. In queries that include a correlated subquery (also known as a repeating subquery), the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query.
In the beginning, databases did not support joins; thus, the only method of returning the desired results was the subquery. If the query with a correlated subquery can be re-written as a join, it will be more efficient to do so. If it can't, then the correlated sub query is required to get the results.
For efficiency in subqueries, 1.) use the MAX function in a subquery instead of the ALL keyword before the subquery; and 2.) use the IN operator instead of the EXISTS keyword, when possible.
For example, the following queries produce the same result, but the second query is more efficient:
proc sql;
select * from work.payroll
where salary> all(select salary
from work.payroll
where jobcode='ME3');
proc sql;
select * from work.payroll
where salary> (select max(salary)
from work.payroll
where jobcode='ME3');
View
question archives.
These sample files and code examples are provided by SAS Institute Inc. "as is" without
warranty of any kind, either express or implied, including but not limited to the implied warranties of
merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute
shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS
Institute will provide no support for the materials contained herein.