The SQL Procedure |
See also: |
|
table-expression <set-operator table-expression> <...set-operator table-expression> |
Arguments |
is described in table-expression.
is one of the following:
INTERSECT <CORRESPONDING> <ALL> |
OUTER UNION <CORRESPONDING> |
UNION <CORRESPONDING> <ALL> |
EXCEPT <CORRESPONDING> <ALL> |
Query Expressions and Table Expressions |
A query-expression is one or more table-expressions. Multiple table expressions are linked by set operators. The following figure illustrates the relationship between table-expressions and query-expressions.
Set Operators |
PROC SQL provides these set operators:
concatenates the query results.
produces all unique rows from both queries.
produces rows that are part of the first query only.
produces rows that are common to both query results.
A query-expression with set operators is evaluated as follows.
Each table-expression is evaluated to produce an (internal) intermediate result table.
Each intermediate result table then becomes an operand linked with a set operator to form an expression, for example, A UNION B.
If the query-expression involves more than two table-expressions, then the result from the first two becomes an operand for the next set operator and operand, such as (A UNION B) EXCEPT C, ((A UNION B) EXCEPT C) INTERSECT D, and so on.
Evaluating a query-expression produces a single output table.
Set operators follow this order of precedence unless they are overridden by parentheses in the expressions: INTERSECT is evaluated first. OUTER UNION, UNION, and EXCEPT have the same level of precedence.
PROC SQL performs set operations even if the tables or views that are referred to in the table-expressions do not have the same number of columns. The reason for this behavior is that the ANSI Standard for SQL requires that tables or views that are involved in a set operation have the same number of columns and that the columns have matching data types. If a set operation is performed on a table or view that has fewer columns than the one or ones with which it is being linked, then PROC SQL extends the table or view with fewer columns by creating columns with missing values of the appropriate data type. This temporary alteration enables the set operation to be performed correctly.
CORRESPONDING (CORR) Keyword |
The CORRESPONDING keyword is used only when a set operator is specified. CORR causes PROC SQL to match the columns in table-expressions by name and not by ordinal position. Columns that do not match by name are excluded from the result table, except for the OUTER UNION operator. See OUTER UNION.
For example, when performing a set operation on two table-expressions, PROC SQL matches the first specified column-name (listed in the SELECT clause) from one table-expression with the first specified column-name from the other. If CORR is omitted, then PROC SQL matches the columns by ordinal position.
ALL Keyword |
The set operators automatically eliminate duplicate rows from their output tables. The optional ALL keyword preserves the duplicate rows, reduces the execution by one step, and thereby improves the query-expression's performance. You use it when you want to display all the rows resulting from the table-expressions, rather than just the unique rows. The ALL keyword is used only when a set operator is also specified.
OUTER UNION |
Performing an OUTER UNION is very similar to performing the SAS DATA step with a SET statement. The OUTER UNION concatenates the intermediate results from the table-expressions. Thus, the result table for the query-expression contains all the rows produced by the first table-expression followed by all the rows produced by the second table-expression. Columns with the same name are in separate columns in the result table.
For example, the following query expression concatenates the ME1 and ME2 tables but does not overlay like-named columns. Outer Union of ME1 and ME2 Tables shows the result.
ME1 IDnum Jobcode Salary Bonus -------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986
ME2 IDnum Jobcode Salary ---------------------------- 1653 ME2 35108 1782 ME2 35345 1244 ME2 36925
proc sql; title 'ME1 and ME2: OUTER UNION'; select * from me1 outer union select * from me2;
Outer Union of ME1 and ME2 Tables
ME1 and ME2: OUTER UNION IDnum Jobcode Salary Bonus IDnum Jobcode Salary -------------------------------------------------------------------- 1400 ME1 29769 587 . 1403 ME1 28072 342 . 1120 ME1 28619 986 . 1120 ME1 28619 986 . . . 1653 ME2 35108 . . 1782 ME2 35345 . . 1244 ME2 36925
Concatenating tables with the OUTER UNION set operator is similar to performing a union join. See Union Joins for more information.
To overlay columns with the same name, use the CORRESPONDING keyword.
proc sql; title 'ME1 and ME2: OUTER UNION CORRESPONDING'; select * from me1 outer union corr select * from me2;
ME1 and ME2: OUTER UNION CORRESPONDING IDnum Jobcode Salary Bonus -------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986 1653 ME2 35108 . 1782 ME2 35345 . 1244 ME2 36925 .
In the resulting concatenated table, notice the following:
OUTER UNION CORRESPONDING retains all nonmatching columns.
For columns with the same name, if a value is missing from the result of the first table-expression, then the value in that column from the second table-expression is inserted.
The ALL keyword is not used with OUTER UNION because this operator's default action is to include all rows in a result table. Thus, both rows from the table ME1 where IDnum is 1120 appear in the output.
UNION |
The UNION operator produces a table that contains all the unique rows that result from both table-expressions. That is, the output table contains rows produced by the first table-expression, the second table-expression, or both.
Columns are appended by position in the tables, regardless of the column names. However, the data type of the corresponding columns must match or the union will not occur. PROC SQL issues a warning message and stops executing.
The names of the columns in the output table are the names of the columns from the first table-expression unless a column (such as an expression) has no name in the first table-expression. In such a case, the name of that column in the output table is the name of the respective column in the second table-expression.
In the following example, PROC SQL combines the two tables:
proc sql; title 'ME1 and ME2: UNION'; select * from me1 union select * from me2;
ME1 and ME2: UNION IDnum Jobcode Salary Bonus -------------------------------------- 1120 ME1 28619 986 1244 ME2 36925 . 1400 ME1 29769 587 1403 ME1 28072 342 1653 ME2 35108 . 1782 ME2 35345 .
In the following example, ALL includes the duplicate row from ME1. In addition, ALL changes the sorting by specifying that PROC SQL make one pass only. Thus, the values from ME2 are simply appended to the values from ME1.
proc sql; title 'ME1 and ME2: UNION ALL'; select * from me1 union all select * from me2;
ME1 and ME2: UNION ALL IDnum Jobcode Salary Bonus -------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986 1653 ME2 35108 . 1782 ME2 35345 . 1244 ME2 36925 .
See Combining Two Tables for another example.
EXCEPT |
The EXCEPT operator produces (from the first table-expression) an output table that has unique rows that are not in the second table-expression. If the intermediate result from the first table-expression has at least one occurrence of a row that is not in the intermediate result of the second table-expression, then that row (from the first table-expression) is included in the result table.
In the following example, the IN_USA table contains flights to cities within and outside the USA. The OUT_USA table contains flights only to cities outside the USA.
IN_USA Flight Dest ------------------ 145 ORD 156 WAS 188 LAX 193 FRA 207 LON
OUT_USA Flight Dest ------------------ 193 FRA 207 LON 311 SJA
This example returns only the rows from IN_USA that are not also in OUT_USA:
proc sql; title 'Flights from IN_USA Only'; select * from in_usa except select * from out_usa;
IN_USA Flight Dest ------------------ 145 ORD 156 WAS 188 LAX 193 FRA 207 LON
OUT_USA Flight Dest ------------------ 193 FRA 207 LON 311 SJA
Flights from IN_USA Only Flight Dest ------------------ 145 ORD 156 WAS 188 LAX
INTERSECT |
The INTERSECT operator produces an output table that has rows that are common to both tables. For example, using the IN_USA and OUT_USA tables shown above, the following example returns rows that are in both tables:
proc sql; title 'Flights from Both IN_USA and OUT_USA'; select * from in_usa intersect select * from out_usa;
Flights from Both IN_USA and OUT_USA
Flights from Both IN_USA and OUT_USA Flight Dest ------------------ 193 FRA 207 LON
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.