Previous Page | Next Page

The SQL Procedure

joined-table


Joins a table with itself or with other tables or views.
Restrictions: Joins are limited to 256 tables.
See also: FROM Clause and query-expression
Featured in:

Joining Two Tables

Performing an Outer Join

Joining Three Tables

Producing All the Possible Combinations of the Values in a Column

Matching Case Rows and Control Rows


[1] table-name <<AS> alias>, table-name <<AS> alias>
<, ... table-name <<AS> alias>>
[2] table-name <<AS> alias> <INNER> JOIN table-name <<AS> alias>
ON sql-expression
[3] table-name <<AS> alias> LEFT JOIN | RIGHT JOIN | FULL JOIN
table-name <<AS> alias> ON sql-expression
[4] table-name <<AS> alias> CROSS JOIN table-name <<AS> alias>
[5] table-name <<AS> alias> UNION JOIN table-name <<AS> alias>
[6] table-name <<AS> alias> NATURAL
<INNER | FULL <OUTER> | LEFT <OUTER > | RIGHT <OUTER >> JOIN table-name <<AS> alias>

Arguments

alias

specifies an alias for table-name. The AS keyword is optional.

sql-expression

is described in sql-expression.

table-name

can be one of the following:

  • the name of a PROC SQL table.

  • the name of a SAS view or PROC SQL view.

  • a query-expression. A query-expression in the FROM clause is usually referred to as an in-line view. See "FROM Clause" for more information about in-line views.

  • a connection to a DBMS in the form of the CONNECTION TO component. See CONNECTION TO for more information.

table-name can be a one-level name, a two-level libref.table name, or a physical pathname that is enclosed in single quotation marks.

Note:   If you include parentheses, then be sure to include them in pairs. Parentheses are not valid around comma joins (type [1]).  [cautionend]


Types of Joins

[1][2] Inner join. See Inner Joins.

[3] Outer join. See Outer Joins.

[4] Cross join. See Cross Joins.

[5] Union join. See Union Joins.

[6] Natural join. See Natural Joins.


Joining Tables

When multiple tables, views, or query-expressions are listed in the FROM clause, they are processed to form one table. The resulting table contains data from each contributing table. These queries are referred to as joins.

Conceptually, when two tables are specified, each row of table A is matched with all the rows of table B to produce an internal or intermediate table. The number of rows in the intermediate table (Cartesian product) is equal to the product of the number of rows in each of the source tables. The intermediate table becomes the input to the rest of the query in which some of its rows can be eliminated by the WHERE clause or summarized by a summary function.

A common type of join is an equijoin, in which the values from a column in the first table must equal the values of a column in the second table.


Table Limit

PROC SQL can process a maximum of 256 tables for a join. If you are using views in a join, then the number of tables on which the views are based count toward the 256-table limit. Each CONNECTION TO component in the Pass-Through Facility counts as one table.


Specifying the Rows to Be Returned

The WHERE clause or ON clause contains the conditions (sql-expression) under which the rows in the Cartesian product are kept or eliminated in the result table. WHERE is used to select rows from inner joins. ON is used to select rows from inner or outer joins.

The expression is evaluated for each row from each table in the intermediate table described earlier in Joining Tables. The row is considered to be matching if the result of the expression is true (a nonzero, nonmissing value) for that row.

Note:   You can follow the ON clause with a WHERE clause to further subset the query result. See Performing an Outer Join for an example.  [cautionend]


Table Aliases

Table aliases are used in joins to distinguish the columns of one table from the columns in the other table or tables. A table name or alias must be prefixed to a column name when you are joining tables that have matching column names. See FROM Clause for more information on table aliases.


Joining a Table with Itself

A single table can be joined with itself to produce more information. These joins are sometimes called reflexive joins. In these joins, the same table is listed twice in the FROM clause. Each instance of the table must have a table alias or you will not be able to distinguish between references to columns in either instance of the table. See Producing All the Possible Combinations of the Values in a Column and Matching Case Rows and Control Rows for examples.


Inner Joins

An inner join returns a result table for all the rows in a table that have one or more matching rows in the other tables, as specified by the sql-expression. Inner joins can be performed on up to 256 tables in the same query-expression.

You can perform an inner join by using a list of table-names separated by commas or by using the INNER, JOIN, and ON keywords.

The LEFTTAB and RIGHTTAB tables are used to illustrate this type of join:

                      Left Table - LEFTTAB

                 Continent  Export    Country
                 -----------------------------
                 NA         wheat     Canada  
                 EUR        corn      France  
                 EUR        rice      Italy   
                 AFR        oil       Egypt   
                     Right Table - RIGHTTAB

                 Continent  Export    Country
                 -----------------------------
                 NA         sugar     USA     
                 EUR        corn      Spain   
                 EUR        beets     Belgium 
                 ASIA       rice      Vietnam 

The following example joins the LEFTTAB and RIGHTTAB tables to get the Cartesian product of the two tables. The Cartesian product is the result of combining every row from one table with every row from another table. You get the Cartesian product when you join two tables and do not subset them with a WHERE clause or ON clause.

proc sql;
   title 'The Cartesian Product of';
   title2 'LEFTTAB and RIGHTTAB';
   select *
      from lefttab, righttab;

Cartesian Product of LEFTTAB and RIGHTTAB Tables

                    The Cartesian Product of
                      LEFTTAB and RIGHTTAB

  Continent  Export    Country   Continent  Export    Country
  ------------------------------------------------------------
  NA         wheat     Canada    NA         sugar     USA     
  NA         wheat     Canada    EUR        corn      Spain   
  NA         wheat     Canada    EUR        beets     Belgium 
  NA         wheat     Canada    ASIA       rice      Vietnam 
  EUR        corn      France    NA         sugar     USA     
  EUR        corn      France    EUR        corn      Spain   
  EUR        corn      France    EUR        beets     Belgium 
  EUR        corn      France    ASIA       rice      Vietnam 
  EUR        rice      Italy     NA         sugar     USA     
  EUR        rice      Italy     EUR        corn      Spain   
  EUR        rice      Italy     EUR        beets     Belgium 
  EUR        rice      Italy     ASIA       rice      Vietnam 
  AFR        oil       Egypt     NA         sugar     USA     
  AFR        oil       Egypt     EUR        corn      Spain   
  AFR        oil       Egypt     EUR        beets     Belgium 
  AFR        oil       Egypt     ASIA       rice      Vietnam 

The LEFTTAB and RIGHTTAB tables can be joined by listing the table names in the FROM clause. The following query represents an equijoin because the values of Continent from each table are matched. The column names are prefixed with the table aliases so that the correct columns can be selected.

proc sql;
   title 'Inner Join';
   select *
      from lefttab as l, righttab as r
      where l.continent=r.continent;

Inner Join

                           Inner Join

  Continent  Export    Country   Continent  Export    Country
  ------------------------------------------------------------
  NA         wheat     Canada    NA         sugar     USA     
  EUR        corn      France    EUR        corn      Spain   
  EUR        corn      France    EUR        beets     Belgium 
  EUR        rice      Italy     EUR        corn      Spain   
  EUR        rice      Italy     EUR        beets     Belgium 

The following PROC SQL step is equivalent to the previous one and shows how to write an equijoin using the INNER JOIN and ON keywords.

proc sql;
   title 'Inner Join';
   select *
      from lefttab as l inner join 
           righttab as r
      on l.continent=r.continent;

See Joining Two Tables, Producing All the Possible Combinations of the Values in a Column, and Matching Case Rows and Control Rows for more examples.


Outer Joins

Outer joins are inner joins that have been augmented with rows that did not match with any row from the other table in the join. The three types of outer joins are left, right, and full.

A left outer join, specified with the keywords LEFT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the first (LEFTTAB) table that do not match any row in the second (RIGHTTAB) table.

proc sql;
   title 'Left Outer Join';
   select *
      from lefttab as l left join 
           righttab as r
      on l.continent=r.continent;

Left Outer Join

                        Left Outer Join

  Continent  Export    Country   Continent  Export    Country
  ------------------------------------------------------------
  AFR        oil       Egypt                                  
  EUR        rice      Italy     EUR        beets     Belgium 
  EUR        corn      France    EUR        beets     Belgium 
  EUR        rice      Italy     EUR        corn      Spain   
  EUR        corn      France    EUR        corn      Spain   
  NA         wheat     Canada    NA         sugar     USA     

A right outer join, specified with the keywords RIGHT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the second (RIGHTTAB) table that do not match any row in the first (LEFTTAB) table.

proc sql;
   title 'Right Outer Join';
   select *
      from lefttab as l right join 
           righttab as r
      on l.continent=r.continent;

Right Outer Join

                        Right Outer Join

  Continent  Export    Country   Continent  Export    Country
  ------------------------------------------------------------
                                 ASIA       rice      Vietnam 
  EUR        rice      Italy     EUR        beets     Belgium 
  EUR        rice      Italy     EUR        corn      Spain   
  EUR        corn      France    EUR        beets     Belgium 
  EUR        corn      France    EUR        corn      Spain   
  NA         wheat     Canada    NA         sugar     USA     

A full outer join, specified with the keywords FULL JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from each table that do not match any row in the other table.

proc sql;
   title 'Full Outer Join';
   select *
      from lefttab as l full join 
           righttab as r
      on l.continent=r.continent;

Full Outer Join

                        Full Outer Join

  Continent  Export    Country   Continent  Export    Country
  ------------------------------------------------------------
  AFR        oil       Egypt                                  
                                 ASIA       rice      Vietnam 
  EUR        rice      Italy     EUR        beets     Belgium 
  EUR        rice      Italy     EUR        corn      Spain   
  EUR        corn      France    EUR        beets     Belgium 
  EUR        corn      France    EUR        corn      Spain   
  NA         wheat     Canada    NA         sugar     USA     

See Performing an Outer Join for another example.


Cross Joins

A cross join returns as its result table the product of the two tables.

Using the LEFTTAB and RIGHTTAB example tables, the following program demonstrates the cross join:

proc sql;
   title 'Cross Join';
   select *
      from lefttab as l cross join
         righttab as r;

Cross Join

                           Cross Join

  Continent  Export    Country   Continent  Export    Country
  ------------------------------------------------------------
  NA         wheat     Canada    NA         sugar     USA     
  NA         wheat     Canada    EUR        corn      Spain   
  NA         wheat     Canada    EUR        beets     Belgium 
  NA         wheat     Canada    ASIA       rice      Vietnam 
  EUR        corn      France    NA         sugar     USA     
  EUR        corn      France    EUR        corn      Spain   
  EUR        corn      France    EUR        beets     Belgium 
  EUR        corn      France    ASIA       rice      Vietnam 
  EUR        rice      Italy     NA         sugar     USA     
  EUR        rice      Italy     EUR        corn      Spain   
  EUR        rice      Italy     EUR        beets     Belgium 
  EUR        rice      Italy     ASIA       rice      Vietnam 
  AFR        oil       Egypt     NA         sugar     USA     
  AFR        oil       Egypt     EUR        corn      Spain   
  AFR        oil       Egypt     EUR        beets     Belgium 
  AFR        oil       Egypt     ASIA       rice      Vietnam 

The cross join is not functionally different from a Cartesian product join. You would get the same result by submitting the following program:

proc sql;
   select *
      from lefttab, righttab;

Do not use an ON clause with a cross join. An ON clause will cause a cross join to fail. However, you can use a WHERE clause to subset the output.


Union Joins

A union join returns a union of the columns of both tables. The union join places in the results all rows with their respective column values from each input table. Columns that do not exist in one table will have null (missing) values for those rows in the result table. The following example demonstrates a union join.

proc sql;
   title 'Union Join';
   select *
      from lefttab union join righttab;

Union Join

                           Union Join

  Continent  Export    Country   Continent  Export    Country
  ------------------------------------------------------------
                                 NA         sugar     USA     
                                 EUR        corn      Spain   
                                 EUR        beets     Belgium 
                                 ASIA       rice      Vietnam 
  NA         wheat     Canada                                 
  EUR        corn      France                                 
  EUR        rice      Italy                                  
  AFR        oil       Egypt                                  

Using a union join is similar to concatenating tables with the OUTER UNION set operator. See query-expression for more information.

Do not use an ON clause with a union join. An ON clause will cause a union join to fail.


Natural Joins

A natural join selects rows from two tables that have equal values in columns that share the same name and the same type. An error results if two columns have the same name but different types. If join-specification is omitted when specifying a natural join, then INNER is implied. If no like columns are found, then a cross join is performed.

The following examples use these two tables:

                             table1

                         x         y         z
                  ----------------------------
                         1         2         3
                         2         1         8
                         6         5         4
                         2         5         6
                             table2

                         x         b         z
                  ----------------------------
                         1         5         3
                         3         5         4
                         2         7         8
                         6         0         4

The following program demonstrates a natural inner join.

proc sql;
   title 'Natural Inner Join';
   select *
   from table1 natural join table2;

Natural Inner Join

                       Natural Inner Join

                    x         z         b         y
             --------------------------------------
                    1         3         5         2
                    2         8         7         1
                    6         4         0         5

The following program demonstrates a natural left outer join.

proc sql;
   title 'Natural Left Outer Join';
   select *
      from table1 natural left join table2;

Natural Left Outer Join

                    Natural Left Outer Join

                    x         z         b         y
             --------------------------------------
                    1         3         5         2
                    2         6         .         5
                    2         8         7         1
                    6         4         0         5

Do not use an ON clause with a natural join. An ON clause will cause a natural join to fail. When using a natural join, an ON clause is implied, matching all like columns.


Joining More Than Two Tables

Inner joins are usually performed on two or three tables, but they can be performed on up to 256 tables in PROC SQL. You can combine several joins of the same or different types as shown in the following code lines:

a natural join b natural join c

a natural join b cross join c

You can also use parentheses to group joins together and control what joins happen in what order as shown in the following examples:

(a, b) left join c on a.X=c.Y

a left join (b full join c on b.Z=c.Z) on a.Y=b.Y

Note:   Commutative behavior varies depending on the type of join that is performed.  [cautionend]

A join on three tables is described here to explain how and why the relationships work among the tables.

In a three-way join, the sql-expression consists of two conditions: one condition relates the first table to the second table; and the other condition relates the second table to the third table. It is possible to break this example into stages. You could perform a two-way join to create a temporary table and then you could join the temporary table with the third one. However, PROC SQL can do it all in one step as shown in the next example. The final table would be the same in both cases.

The example shows the joining of three tables: COMM, PRICE, and AMOUNT. To calculate the total revenue from exports for each country, you need to multiply the amount exported (AMOUNT table) by the price of each unit (PRICE table), and you must know the commodity that each country exports (COMM table).

                           COMM Table

                 Continent  Export    Country
                 -----------------------------
                 NA         wheat     Canada  
                 EUR        corn      France  
                 EUR        rice      Italy   
                 AFR        oil       Egypt   
                          PRICE Table

                       Export       Price
                       ------------------
                       rice          3.56
                       corn          3.45
                       oil             18
                       wheat         2.98
                          AMOUNT Table

                       Country   Quantity
                       ------------------
                       Canada       16000
                       France        2400
                       Italy          500
                       Egypt        10000

proc sql;
title  'Total Export Revenue';
select c.Country, p.Export, p.Price,
      a.Quantity, a.quantity*p.price 
      as Total    
   from comm as c JOIN price as p  
       on (c.export=p.export)
        JOIN amount as a
      on (c.country=a.country);
quit;

Three-Way Join

                     Total Export Revenue

        Country   Export       Price  Quantity     Total
       ------------------------------------------------
        Canada    wheat         2.98     16000     47680
        France    corn          3.45      2400      8280
        Italy     rice          3.56       500      1780
        Egypt     oil             18     10000    180000

See Joining Three Tables for another example.


Comparison of Joins and Subqueries

You can often use a subquery or a join to get the same result. However, it is often more efficient to use a join if the outer query and the subquery do not return duplicate rows. For example, the following queries produce the same result. The second query is more efficient:

proc sql;
   select IDNumber, Birth
      from proclib.payroll
      where IDNumber in (select idnum
                      from proclib.staff
                      where lname like 'B%');

proc sql;
   select  p.IDNumber, p.Birth
     from proclib.payroll p, proclib.staff s
     where p.idnumber=s.idnum 
           and s.lname like 'B%';

Note:   PROCLIB.PAYROLL is shown in Creating a Table from a Query's Result.  [cautionend]

Previous Page | Next Page | Top of Page