Join Operations

Overview of Join Operations

A join operation is a query that combines data from two or more tables or views based usually on relationships among the data in those tables. When multiple table specifications are listed in the FROM clause of a SELECT statement, they are processed to form one result set. The result set contains data from each contributing table and can be saved as a table or used as is. Most join operations contain at least one join condition, which is either in the FROM clause or in a WHERE clause.
For example, you can join the data of two tables based on the values of a column that exists in both tables. The following query joins the two tables Products and Sales. FedSQL creates the result set by retrieving the data for columns Product and Totals where the values match for the column Prodid.
select products.product, sales.totals
   from products, sales
   where products.prodid=sales.prodid;
Join Result Sets of Tables Products and Sales
Join Result Sets of Tables Products and Sales
Most joins are of two tables. However, you can join more than two tables. To perform a join operation of three or more tables, FedSQL first joins two tables based on the join condition. Then FedSQL joins the results to another table based on the join condition. This process continues until all tables are joined into the result set. The following query first joins tables Products and Sales, which produces a result set, and then joins the result set and the table Customers, which produces the final result set.
select products.product, sales.totals, customers.city
   from products, sales, customers
   where products.prodid=sales.prodid and sales.custid=customers.custid;
Join Result Set of Tables Products, Sales, and Customers
Join Result Set of Tables Products, Sales, and Customers
FedSQL supports several join operations such as simple joins, equijoins, cross joins, qualified joins, and natural joins. Appropriate syntax determines the type of join operation. In addition, the qualified and natural join operations can be affected by specifying the join type, which can be an inner join or an outer join.
Note: The join operation examples in this section use the tables Customers, Products, and Sales. To view the tables, see Tables Used in Examples.

Understanding the Join Operations

Simple Join

A simple join is the most basic type of join where multiple tables, separated by commas, are listed in the FROM clause of a SELECT statement. There is no join condition. Joining tables in this way produces a result set where each row from the first table is combined with each row of the second table, and so on.
This simple join example selects all columns and all rows from the tables Products and Sales.
select * from products, sales;
Simple Join of Two Tables
Simple Join of Two Tables
This example is also a simple join, but the SELECT statement specifies one column from each of three tables. Each row from the first table is combined with each row from the second table, which are then combined with each row from the third table. The result is a large, basically meaningless result set. The following output shows only a portion of the result set.
select products.product, sales.totals, customers.country
   from products, sales, customers;
Simple Join of Three Tables
Simple Join of Three Tables

Equijoin

An equijoin is a simple join that is subset with a WHERE clause. The join condition must be an equality comparison. An equijoin produces a more meaningful result than just a simple join, because only rows meeting the equality test are returned. Multiple match criteria can be specified by using the AND operator. When multiple match criteria are specified, only rows that meet all of the equality tests are returned.
This equijoin example selects all columns from the tables Products and Sales where the values match for the column Prodid, which exists in both tables. Because all columns are selected with the * notation, the Prodid column is duplicated in the result set.
select * from products, sales
   where products.prodid=sales.prodid;
Equijoin of All Columns
Equijoin of All Columns
When you specify the columns Prodid, Product, and Totals in the SELECT statement, the column Prodid is not duplicated, even though it exists in both the Products and Sales tables. The result set includes the data where the values match for the column Prodid.
select products.prodid, products.product, sales.totals 
   from products, sales
   where products.prodid=sales.prodid;
Equijoin with Specified Columns
Equijoin with Specified Columns
This equijoin example selects the columns Product, Totals, and City from the tables Products, Sales, and Customers. First, the result set includes the data where the values match for the column Prodid, which exists in tables Products and Sales. Then the result set is combined with the data where the values match for the column Custid, which exists in tables Sales and Customers.
select products.product, sales.totals, customers.city
   from products, sales, customers
   where products.prodid=sales.prodid and sales.custid=customers.custid;
Equijoin of Three Tables
Equijoin of Three Tables

Cross Join

A cross join is a relational join that results in a Cartesian product of two tables. A cross join is requested with the syntax CROSS JOIN. A cross join can be subset with a WHERE clause, but you cannot use an ON clause.
This cross join example selects all columns and all rows from the tables Products and Sales, which produces the same results as a simple join of two tables.
select * from products cross join sales;
Cross Join of Two Tables
Cross Join of Two Tables
This cross join example selects the columns Prodid, Product, and Totals from tables Products and Sales. The result set includes the data where the values match for the column Prodid. The results are the same as an equijoin of two tables.
select products.prodid, products.product, sales.totals 
   from products cross join sales
   where products.prodid=sales.prodid;
Cross Join with a WHERE Clause
Cross Join with a WHERE Clause

Qualified Join

A qualified join provides an easy way to control which rows appear in the result set. You can use any columns to match rows from one table against those from another table. A qualified join is requested with the syntax JOIN and then the syntax ON or USING to specify the join condition. You can use a WHERE clause to further subset the query results.
  • The ON clause specifies a join condition to filter the data. The ON clause accepts search conditions such as conditional expressions like the WHERE clause. The ON clause joins tables where the column names do not match in both tables. For columns that exist in both tables, the ON clause preserves the columns from each joined table separately in the result set.
  • The USING clause specifies columns to test for equality. The columns listed in the USING clause must be present in both tables. The USING clause is like a shorthand way of defining join conditions without having to specify a qualifier. The USING clause is equivalent to a join condition where each column from the left table is compared to a column with the same name in the right table. For columns that exist in both tables, the USING clause merges the columns from the joined tables into a single column.
A qualified join can be an inner join or an outer join, which is requested with the syntax INNER or OUTER. If the join type specification is omitted, then an inner join is implied. See Inner and Outer Join Types.
This qualified join example selects all columns from the tables Products and Sales. The returned rows are filtered based on the column Country in the Sales table, where the value in Country equals United States. The column Prodid exists in both tables and is duplicated in the result set.
select * from products join sales
   on (sales.country='United States');
Qualified Join with an ON Clause
Qualified Join with an ON Clause
This qualified join example selects all columns from the tables Products and Sales. The returned rows are filtered by selecting the values that match for the column Prodid, which exists in both tables. The USING clause is like a shorthand way of defining join conditions without having to specify a qualifier. The USING clause is equivalent to a join condition where each column from the left table is compared to a column with the same name in the right table. Unlike an equijoin and a cross join, the column Prodid is not duplicated in the result set.
select * from products join sales
   using (prodid);
Qualified Join with a USING Clause
Qualified Join with a USING Clause
This qualified join example selects columns Prodid, Product, and Totals from the tables Products and Sales. The returned rows are filtered based on the column Country where the value equals United States. The returned rows are further subset where the value for Product equals Rice.
select products.prodid, products.product, sales.totals 
   from products join sales
   on (sales.country='United States')
   where products.product='Rice';
Qualified Join with an ON Clause and a WHERE Clause
Qualified Join with an ON Clause and a WHERE Clause

Natural Join

A natural join selects rows from two tables that have equal values in columns that share the same name and the same type. A natural join is requested with the syntax NATURAL JOIN. If like columns are not found, then a cross join is performed. Do not use an ON clause with a natural join. When using a natural join, an ON clause is implied, matching all like columns. You can use a WHERE clause to subset the query results. A natural join functions the same as a qualified join with the USING clause. A natural join is a shorthand of USING. Like USING, like columns appear only once in the result set.
A natural join can be an inner join or an outer join, which is requested with the syntax INNER or OUTER. If the join type specification is omitted, then an inner join is implied. See Inner and Outer Join Types.
This natural join example selects all columns from the tables Products and Sales. The result set includes the data where the values match for the column Prodid, which exists in both tables. Unlike a cross join and a simple join of two tables, the natural join result set does not include duplicate Prodid columns.
select * from products natural join sales;
Natural Join of All Columns
Natural Join of All Columns
This natural join example selects columns City and Totals from the tables Sales and Customers. The result set includes the data where the values match for the columns Custid and Country, which exist in both tables. The returned rows are subset where the value for Country equals United States.
select customers.city, sales.totals 
   from sales natural join customers
   where customers.country='United States';
Natural Join with a WHERE Clause
Natural Join with a WHERE Clause

Inner and Outer Join Types

Overview of Inner and Outer Join Types

The result set from a qualified join and a natural join can be affected by specifying the join type, which can be an inner join or an outer join. By default, qualified joins and natural joins function as inner joins.

Inner Joins

An inner join returns a result set that includes all rows from the first table that matches rows from the second table. Inner joins return only those rows that satisfy the join condition. Unmatched rows from both tables are discarded. By default, qualified joins and natural joins function as inner joins. Including the syntax INNER has no additional effects on the result set.
select * from products inner join sales 
   on (sales.country='United States');
select customers.city, sales.totals
   from sales natural inner join customers
   where country='United States';

Outer Joins

An outer join returns a result set that includes all rows that satisfy the join condition as well as unmatched rows from one or both tables. An outer join can be a left, right, or full outer join. An inner join discards any rows where the join condition is not met, but an outer joins maintains some or all of the unmatched rows.
For an outer join, a specified WHERE clause is applied after the join is performed and eliminates all rows that do not satisfy the WHERE clause. Applying a WHERE clause to an outer join can sometimes defeat the purpose, because the WHERE clause deletes the very rows that the outer join retains.
  • A left outer join preserves unmatched rows from the left table, which is the first table listed in the SELECT statement. A left outer join returns a result set that includes all rows that satisfy the join condition and rows from the left table that do not match the join condition. Therefore, a left outer join returns all rows from the left table, and only the matching rows from the right table. A left outer join is requested with the syntax LEFT [OUTER].
    This qualified join example returns a result set that includes all rows from both tables that satisfy the join condition. The join condition filters rows based on the column Country where the value equals United States. The result set also includes rows from the Customers table that do not match the join condition. As a left outer join, all rows from the Customers table are returned.
    select customers.city, sales.totals 
       from customers left outer join sales
       on (customers.country='United States');
    Left Outer Qualified Join with an ON Clause
    This natural join example returns a result set that includes all rows from both tables that satisfy the join condition, which includes the data where the values match for the column Prodid The result set also includes a row from the Sales table that does not match the join condition. As a left outer join, all rows from the Sales table are returned.
    select * from sales natural left outer join products;
    Left Outer Natural Join
  • A right outer join preserves unmatched rows from the right table, which is the second table listed in the SELECT statement. A right outer join returns a result set that includes all rows that satisfy the join condition and rows from the right table that do not match the join condition. Therefore, a right outer join returns all rows from the right table, and only the matching rows from the left table. A right outer join is requested with the syntax RIGHT [OUTER].
    This qualified join example returns a result set that includes all rows from both tables that satisfy the join condition. The join condition filters rows based on the column Country where the value equals United States. The result set also includes rows from the Sales table that do not match the join condition. As a right outer join, all rows from the Sales table are returned.
    select * from products right outer join sales
       on (sales.country='United States');
    Right Outer Qualified Join
    This natural join example returns a result set that includes all rows from both tables that satisfy the join condition, which includes the data where the values match for the column Prodid. The result set also includes a row from the Sales table that does not match the join condition. As a right outer join, all rows from the Sales table are returned.
    select * from products natural right outer join sales;
    Right Outer Natural Join
  • A full outer join preserves unmatched rows from both tables. That is, a full outer join returns all matching and unmatching rows from the left and right table. A full outer join is requested with the syntax FULL [OUTER].
    This qualified join example returns a result set that includes all rows from both tables that satisfy the join condition. The join condition filters rows based on the column Product containing the value Rice. The result set also includes all rows from both tables that do not match the join condition. As a full outer join, all rows from both tables are returned.
    select * from products full outer join sales
       on (products.product='Rice');
    Full Outer Qualified Join
    This natural join example returns a result set that includes all rows from both tables that satisfy the join condition, which includes the data where the values match for the column Prodid. The result set also includes a row from the Sales table and a row from the Products table that does not match the join condition. As a full outer join, all rows from both tables are returned.
    select * from products natural full outer join sales;
    Full Outer Natural Join

Joining Heterogeneous Data

Because typical organizations store data in multiple databases, FedSQL supports joining heterogeneous data. A heterogeneous join occurs when the tables in a join operation exist on different data sources. A heterogeneous join is one type of a federated query.
All FedSQL join operations can be performed as heterogeneous joins. For example, the following natural join is a heterogeneous join between an Oracle table and a Teradata table.
select * from oracle.product natural left outer join tera.sales;
This heterogeneous join example queries both a SAS data set and an Oracle table. PROC FEDSQL uses the attributes of the librefs to connect to the two data sources. The SAS data set MyBase.Products contains the columns Prodid and Product. The Oracle table Oracle.Sales contains the columns Prodid, Totals, and Country. The SELECT statement joins the columns and rows from both tables based on the column Prodid, which exists in both tables.
libname mybase base 'C:\Base';
libname myoracle oracle user=scott password=tiger path=oraclev11;

proc fedsql;
   select mybase.products.prodid, mybase.products.product, myoracle.sales.totals 
      from mybase.products, myoracle.sales
      where products.prodid=sales.prodid;
quit;
FedSQL Heterogeneous Join of a SAS Data Set and an Oracle Table
FedSQL Heterogeneous Join of a SAS Data Set and an Oracle Table
This heterogeneous join example queries three tables: two Oracle tables and one SAS data set. In the query, the join of MyOracle.Products and MyOracle.Customers is performed by Oracle. The join of the Oracle result set with the SAS data set MyBase.Sales is performed by FedSQL.
libname mybase base 'C:\Base';
libname myoracle oracle user=scott password=tiger path=oraclev11;

proc fedsql;
   select myoracle.products.product, mybase.sales.totals, myoracle.customers.city
      from myoracle.products, mybase.sales, myoracle.customers
      where products.prodid=sales.prodid and sales.custid=customers.custid;
quit;
FedSQL Heterogeneous Join of Two Oracle Tables and a SAS Data Set
FedSQL Heterogeneous Join of Two Oracle Tables and a SAS Data Set