Retrieving Data from Multiple Tables

Selecting Data from More Than One Table by Using Joins

The data that you need for a report could be located in more than one table. In order to select the data from the tables, join the tables in a query. Joining tables enables you to select data from multiple tables as if the data were contained in one table. Joins do not alter the original tables.

The most basic type of join is simply two tables that are listed in the FROM clause of a SELECT statement. The following query joins the two tables that are shown in Table One and Table Two and creates Cartesian Product of Table One and Table Two.

proc sql;
   title 'Table One and Table Two';
   select *
      from one, two;

Table One and Table Two

                                   Table One

                                      X  Y
                               ------------------
                                      1  2       
                                      2  3

                                   Table Two

                                      X  Z
                               ------------------
                                      2  5       
                                      3  6       
                                      4  9

Cartesian Product of Table One and Table Two

                            Table One and Table Two

                            X  Y                X  Z
                     --------------------------------------
                            1  2                2  5       
                            1  2                3  6       
                            1  2                4  9       
                            2  3                2  5       
                            2  3                3  6       
                            2  3                4  9

Joining tables in this way returns the Cartesian product of the tables. Each row from the first table is combined with every row from the second table. When you run this query, the following message is written to the SAS log:

Cartesian Product Log Message

NOTE: The execution of this query involves performing one or more Cartesian 
      product joins that can not be optimized.

The Cartesian product of large tables can be huge. Typically, you want a subset of the Cartesian product. You specify the subset by declaring the join type.

There are two types of joins:

Inner Joins return a result table for all the rows in a table that have one or more matching rows in the other table or tables that are listed in the FROM clause.
Outer Joins are inner joins that are augmented with rows that did not match with any row from the other table in the join. There are three kinds of outer joins: left, right, and full.

Inner Joins

[Inner join diagram]

An inner join returns only the subset of rows from the first table that matches rows from the second table. You can specify the columns that you want to be compared for matching values in a WHERE clause.

The following code adds a WHERE clause to the previous query. The WHERE clause specifies that only rows whose values in column X of Table One match values in column X of Table Two should appear in the output. Compare this query's output to Cartesian Product of Table One and Table Two.

proc sql;     
   select * from one, two
      where one.x=two.x;

Table One and Table Two Joined

                            Table One and Table Two

                            X  Y                X  Z
                     --------------------------------------
                            2  3                2  5

The output contains only one row because only one value in column X matches from each table. In an inner join, only the matching rows are selected. Outer joins can return nonmatching rows; they are covered in Outer Joins.

Note that the column names in the WHERE clause are prefixed by their table names. This is known as qualifying the column names, and it is necessary when you specify columns that have the same name from more than one table. Qualifying the column name avoids creating an ambiguous column reference.

Using Table Aliases

A table alias is a temporary, alternate name for a table. You specify table aliases in the FROM clause. Table aliases are used in joins to qualify column names and can make a query easier to read by abbreviating table names.

The following example compares the oil production of countries to their oil reserves by joining the OILPROD and OILRSRVS tables on their Country columns. Because the Country columns are common to both tables, they are qualified with their table aliases. You could also qualify the columns by prefixing the column names with the table names.

Note: The AS keyword is optional. [cautionend]

proc sql outobs=6;
   title 'Oil Production/Reserves of Countries';
   select * from sql.oilprod as p, sql.oilrsrvs as r
      where p.country = r.country;

Abbreviating Column Names by Using Table Aliases

                      Oil Production/Reserves of Countries

                             Barrels
  Country                     PerDay  Country                         Barrels
  ---------------------------------------------------------------------------
  Algeria                  1,400,000  Algeria                   9,200,000,000
  Canada                   2,500,000  Canada                    7,000,000,000
  China                    3,000,000  China                    25,000,000,000
  Egypt                      900,000  Egypt                     4,000,000,000
  Indonesia                1,500,000  Indonesia                 5,000,000,000
  Iran                     4,000,000  Iran                     90,000,000,000

Note that each table's Country column is displayed. Typically, once you have determined that a join is functioning correctly, you include just one of the matching columns in the SELECT clause.

Specifying the Order of Join Output

You can order the output of joined tables by one or more columns from either table. The next example's output is ordered in descending order by the BarrelsPerDay column. It is not necessary to qualify BarrelsPerDay, because the column exists only in the OILPROD table.

proc sql outobs=6;
   title 'Oil Production/Reserves of Countries';
   select p.country, barrelsperday 'Production', barrels 'Reserves'
      from sql.oilprod p, sql.oilrsrvs r
      where p.country = r.country
      order by barrelsperday desc;

Ordering the Output of Joined Tables

                      Oil Production/Reserves of Countries

          Country                        Production           Reserves
          ------------------------------------------------------------
          Saudi Arabia                    9,000,000    260,000,000,000
          United States of America        8,000,000     30,000,000,000
          Iran                            4,000,000     90,000,000,000
          Norway                          3,500,000     11,000,000,000
          Mexico                          3,400,000     50,000,000,000
          China                           3,000,000     25,000,000,000

Creating Inner Joins Using INNER JOIN Keywords

The INNER JOIN keywords can be used to join tables. The ON clause replaces the WHERE clause for specifying columns to join. PROC SQL provides these keywords primarily for compatibility with the other joins (OUTER, RIGHT, and LEFT JOIN). Using INNER JOIN with an ON clause provides the same functionality as listing tables in the FROM clause and specifying join columns with a WHERE clause.

This code produces the same output as the previous code but uses the INNER JOIN construction.

proc sql ;
   select p.country, barrelsperday 'Production', barrels 'Reserves' 
      from sql.oilprod p inner join sql.oilrsrvs r
           on p.country = r.country
   order by barrelsperday desc;

Joining Tables Using Comparison Operators

Tables can be joined by using comparison operators other than the equal sign (= ) in the WHERE clause (for a list of comparison operators, see Retrieving Rows Based on a Comparison). In this example, all U.S. cities in the USCITYCOORDS table are selected that are south of Cairo, Egypt. The compound WHERE clause specifies the city of Cairo in the WORLDCITYCOORDS table and joins USCITYCOORDS and WORLDCITYCOORDS on their Latitude columns, using a less-than (lt ) operator.

proc sql;
   title 'US Cities South of Cairo, Egypt';
   select us.City, us.State, us.Latitude, world.city, world.latitude
      from sql.worldcitycoords world, sql.uscitycoords us
      where world.city = 'Cairo' and
            us.latitude lt world.latitude;

Using Comparison Operators to Join Tables

                        US Cities South of Cairo, Egypt

   City                State  Latitude  City                        Latitude
   -------------------------------------------------------------------------
   Honolulu            HI           21  Cairo                             30
   Key West            FL           24  Cairo                             30
   Miami               FL           26  Cairo                             30
   San Antonio         TX           29  Cairo                             30
   Tampa               FL           28  Cairo                             30

When you run this query, the following message is written to the SAS log:

Comparison Query Log Message

NOTE: The execution of this query involves performing one or more Cartesian 
      product joins that can not be optimized.

Recall that you see this message when you run a query that joins tables without specifying matching columns in a WHERE clause. PROC SQL also displays this message whenever tables are joined by using an inequality operator.

The Effects of Null Values on Joins

Most database products treat nulls as distinct entities and do not match them in joins. PROC SQL treats nulls as missing values and as matches for joins. Any null will match with any other null of the same type (character or numeric) in a join.

The following example joins Table One and Table Two on column B. There are null values in column B of both tables. Notice in the output that the null value in row c of Table One matches all the null values in Table Two. This is probably not the intended result for the join.

proc sql;
   title 'One and Two Joined';
   select one.a 'One', one.b, two.a 'Two', two.b
      from one, two
      where one.b=two.b;

Joining Tables That Contain Null Values

                                   Table One

                               a                b
                               ------------------
                               a                1
                               b                2
                               c                .
                               d                4

                                   Table Two

                               a                b
                               ------------------
                               a                1
                               b                2
                               c                .
                               d                4
                               e                .
                               f                .

                               One and Two Joined

                     One              b  Two              b
                     --------------------------------------
                     a                1  a                1
                     b                2  b                2
                     c                .  c                .
                     d                4  d                4
                     c                .  e                .
                     c                .  f                .

In order to specify only the nonmissing values for the join, use the IS NOT MISSING operator:

proc sql;
   select one.a 'One', one.b, two.a 'Two', two.b
      from one, two
      where one.b=two.b and
            one.b is not missing;

Results of Adding IS NOT MISSING to Joining Tables That Contain Null Values

                               One and Two Joined

                     One              b  Two              b
                     --------------------------------------
                     a                1  a                1
                     b                2  b                2
                     d                4  d                4

Creating Multicolumn Joins

When a row is distinguished by a combination of values in more than one column, use all the necessary columns in the join. For example, a city name could exist in more than one country. To select the correct city, you must specify both the city and country columns in the joining query's WHERE clause.

This example displays the latitude and longitude of capital cities by joining the COUNTRIES table with the WORLDCITYCOORDS table. To minimize the number of rows in the example output, the first part of the WHERE expression selects capitals with names that begin with the letter L from the COUNTRIES table.

proc sql;
   title 'Coordinates of Capital Cities';
   select Capital format=$12., Name format=$12.,
          City format=$12., Country format=$12., 
          Latitude, Longitude
      from sql.countries, sql.worldcitycoords
      where Capital like 'L%' and
                  Capital = City;

London occurs once as a capital city in the COUNTRIES table. However, in WORLDCITYCOORDS, London is found twice: as a city in England and again as a city in Canada. Specifying only Capital = City in the WHERE expression yields the following incorrect output:

Selecting Capital City Coordinates (incorrect output)

                         Coordinates of Capital Cities

  Capital       Name          City          Country       Latitude  Longitude
  ---------------------------------------------------------------------------
  La Paz        Bolivia       La Paz        Bolivia            -16        -69
  London        England       London        Canada              43        -81
  Lima          Peru          Lima          Peru               -13        -77
  Lisbon        Portugal      Lisbon        Portugal            39        -10
  London        England       London        England             51          0

Notice in the output that the inner join incorrectly matches London, England, to both London, Canada, and London, England. By also joining the country name columns together (COUNTRIES.Name to WORLDCITYCOORDS.Country), the rows match correctly.

proc sql;
   title 'Coordinates of Capital Cities';
   select Capital format=$12., Name format=$12.,
          City format=$12., Country format=$12., 
          latitude, longitude
      from sql.countries, sql.worldcitycoords
      where Capital like 'L%' and
            Capital = City and
            Name = Country;

Selecting Capital City Coordinates (correct output)

                         Coordinates of Capital Cities

  Capital       Name          City          Country       Latitude  Longitude
  ---------------------------------------------------------------------------
  La Paz        Bolivia       La Paz        Bolivia            -16        -69
  Lima          Peru          Lima          Peru               -13        -77
  Lisbon        Portugal      Lisbon        Portugal            39        -10
  London        England       London        England             51          0

Selecting Data from More Than Two Tables

The data that you need could be located in more than two tables. For example, if you want to show the coordinates of the capitals of the states in the United States, then you need to join the UNITEDSTATES table, which contains the state capitals, with the USCITYCOORDS table, which contains the coordinates of cities in the United States. Because cities must be joined along with their states for an accurate join (similarly to the previous example), you must join the tables on both the city and state columns of the tables.

Joining the cities, by joining the UNITEDSTATES.Capital column to the USCITYCOORDS.City column, is straightforward. However, in the UNITEDSTATES table the Name column contains the full state name, while in USCITYCOORDS the states are specified by their postal code. It is therefore impossible to directly join the two tables on their state columns. To solve this problem, it is necessary to use the POSTALCODES table, which contains both the state names and their postal codes, as an intermediate table to make the correct relationship between UNITEDSTATES and USCITYCOORDS. The correct solution joins the UNITEDSTATES.Name column to the POSTALCODES.Name column (matching the full state names), and the POSTALCODES.Code column to the USCITYCOORDS.State column (matching the state postal codes).

title 'Coordinates of State Capitals';
proc sql outobs=10;
   select us.Capital format=$15., us.Name 'State' format=$15., 
          pc.Code, c.Latitude, c.Longitude
      from sql.unitedstates us, sql.postalcodes pc, 
           sql.uscitycoords c
      where us.Capital = c.City and
            us.Name = pc.Name and
            pc.Code = c.State;

Selecting Data from More Than Two Tables

                         Coordinates of State Capitals

          Capital          State            Code  Latitude  Longitude
          -----------------------------------------------------------
          Albany           New York         NY          43        -74
          Annapolis        Maryland         MD          39        -77
          Atlanta          Georgia          GA          34        -84
          Augusta          Maine            ME          44        -70
          Austin           Texas            TX          30        -98
          Baton Rouge      Louisiana        LA          31        -91
          Bismarck         North Dakota     ND          47       -101
          Boise            Idaho            ID          43       -116
          Boston           Massachusetts    MA          42        -72
          Carson City      Nevada           NV          39       -120

Showing Relationships within a Single Table Using Self-Joins

When you need to show comparative relationships between values in a table, it is sometimes necessary to join columns within the same table. Joining a table to itself is called a self-join, or reflexive join. You can think of a self-join as PROC SQL making an internal copy of a table and joining the table to its copy.

For example, the following code uses a self-join to select cities that have average yearly high temperatures equal to the average yearly low temperatures of other cities.

 proc sql;
    title "Cities' High Temps = Cities' Low Temps";
    select High.City format $12., High.Country format $12., 
           High.AvgHigh, ' | ',
           Low.City format $12., Low.Country format $12., 
           Low.AvgLow
       from sql.worldtemps High, sql.worldtemps Low
       where High.AvgHigh = Low.AvgLow and
             High.city ne Low.city and
             High.country ne Low.country;

Notice that the WORLDTEMPS table is assigned two aliases, High and Low . Conceptually, this makes a copy of the table so that a join can be made between the table and its copy. The WHERE clause selects those rows that have high temperature equal to low temperature.

The WHERE clause also prevents a city from being joined to itself (City ne City and Country ne Country ), although, in this case, it is highly unlikely that the high temperature would be equal to the low temperature for the same city.

Joining a Table to Itself (Self-Join)

                     Cities' High Temps = Cities' Low Temps

City          Country        AvgHigh       City          Country         AvgLow
-------------------------------------------------------------------------------
Amsterdam     Netherlands         70   |   San Juan      Puerto Rico         70
Auckland      New Zealand         75   |   Lagos         Nigeria             75
Auckland      New Zealand         75   |   Manila        Philippines         75
Berlin        Germany             75   |   Lagos         Nigeria             75
Berlin        Germany             75   |   Manila        Philippines         75
Bogota        Colombia            69   |   Bangkok       Thailand            69
Cape Town     South Africa        70   |   San Juan      Puerto Rico         70
Copenhagen    Denmark             73   |   Singapore     Singapore           73
Dublin        Ireland             68   |   Bombay        India               68
Glasgow       Scotland            65   |   Nassau        Bahamas             65
London        England             73   |   Singapore     Singapore           73
Oslo          Norway              73   |   Singapore     Singapore           73
Reykjavik     Iceland             57   |   Caracas       Venezuela           57
Stockholm     Sweden              70   |   San Juan      Puerto Rico         70

Outer Joins

Outer joins are inner joins that are augmented with rows from one table that do not match any row from the other table in the join. The resulting output includes rows that match and rows that do not match from the join's source tables. Nonmatching rows have null values in the columns from the unmatched table. Use the ON clause instead of the WHERE clause to specify the column or columns on which you are joining the tables. However, you can continue to use the WHERE clause to subset the query result.

Including Nonmatching Rows with the Left Outer Join

[Left outer join diagram]

A left outer join lists matching rows and rows from the left-hand table (the first table listed in the FROM clause) that do not match any row in the right-hand table. A left join is specified with the keywords LEFT JOIN and ON.

For example, to list the coordinates of the capitals of international cities, join the COUNTRIES table, which contains capitals, with the WORLDCITYCOORDS table, which contains cities' coordinates, by using a left join. The left join lists all capitals, regardless of whether the cities exist in WORLDCITYCOORDS. Using an inner join would list only capital cities for which there is a matching city in WORLDCITYCOORDS.

proc sql outobs=10;
   title 'Coordinates of Capital Cities';
   select Capital format=$20., Name 'Country' format=$20., 
          Latitude, Longitude
      from sql.countries a left join sql.worldcitycoords b
           on a.Capital = b.City and
              a.Name = b.Country
      order by Capital;

Left Join of COUNTRIES and WORLDCITYCOORDS

                         Coordinates of Capital Cities

        Capital               Country               Latitude  Longitude
        ---------------------------------------------------------------
                              Channel Islands              .          .
        Abu Dhabi             United Arab Emirates         .          .
        Abuja                 Nigeria                      .          .
        Accra                 Ghana                        5          0
        Addis Ababa           Ethiopia                     9         39
        Algiers               Algeria                     37          3
        Almaty                Kazakhstan                   .          .
        Amman                 Jordan                      32         36
        Amsterdam             Netherlands                 52          5
        Andorra la Vella      Andorra                      .          .

Including Nonmatching Rows with the Right Outer Join

[Right outer join diagram]

A right join, specified with the keywords RIGHT JOIN and ON, is the opposite of a left join: nonmatching rows from the right-hand table (the second table listed in the FROM clause) are included with all matching rows in the output. This example reverses the join of the last example; it uses a right join to select all the cities from the WORLDCITYCOORDS table and displays the population only if the city is the capital of a country (that is, if the city exists in the COUNTRIES table).

proc sql outobs=10;
   title 'Populations of Capitals Only';
   select City format=$20., Country 'Country' format=$20., 
          Population
      from sql.countries right join sql.worldcitycoords 
           on Capital = City and
              Name = Country
      order by City;

Right Join of COUNTRIES and WORLDCITYCOORDS

                          Populations of Capitals Only

             City                  Country               Population
             ------------------------------------------------------
             Abadan                Iran                           .
             Acapulco              Mexico                         .
             Accra                 Ghana                   17395511
             Adana                 Turkey                         .
             Addis Ababa           Ethiopia                59291170
             Adelaide              Australia                      .
             Aden                  Yemen                          .
             Ahmenabad             India                          .
             Algiers               Algeria                 28171132
             Alice Springs         Australia                      .

Selecting All Rows with the Full Outer Join

[Full outer join diagram]

A full outer join, specified with the keywords FULL JOIN and ON, selects all matching and nonmatching rows. This example displays the first ten matching and nonmatching rows from the City and Capital columns of WORLDCITYCOORDS and COUNTRIES. Note that the pound sign (# ) is used as a line split character in the labels.

proc sql outobs=10;   
   title 'Populations and/or Coordinates of World Cities';
   select City '#City#(WORLDCITYCOORDS)' format=$20.,
          Capital '#Capital#(COUNTRIES)' format=$20.,
          Population, Latitude, Longitude
      from sql.countries full join sql.worldcitycoords
           on Capital = City and
              Name = Country;

Full Outer Join of COUNTRIES and WORLDCITYCOORDS

                 Populations and/or Coordinates of World Cities

  City                  Capital
  (WORLDCITYCOORDS)     (COUNTRIES)           Population  Latitude  Longitude
  ---------------------------------------------------------------------------
                                                  146436         .          .
  Abadan                                               .        30         48
                        Abu Dhabi                2818628         .          .
                        Abuja                   99062003         .          .
  Acapulco                                             .        17       -100
  Accra                 Accra                   17395511         5          0
  Adana                                                .        37         35
  Addis Ababa           Addis Ababa             59291170         9         39
  Adelaide                                             .       -35        138
  Aden                                                 .        13         45

Specialty Joins

Three types of joins--cross joins, union joins, and natural joins--are special cases of the standard join types.

Including All Combinations of Rows with the Cross Join

A cross join is a Cartesian product; it returns the product of two tables. Like a Cartesian product, a cross join's output can be limited by a WHERE clause.

This example shows a cross join of the tables One and Two:

Tables One and Two

                                  Table One

                                     X  Y
                              ------------------
                                     1  2       
                                     2  3

                                  Table Two

                                     W  Z
                              ------------------
                                     2  5       
                                     3  6       
                                     4  9

proc sql;
   select *
      from one cross join two;

Cross Join

                                The SAS System

                           X  Y                W  Z
                    --------------------------------------
                           1  2                2  5       
                           1  2                3  6       
                           1  2                4  9       
                           2  3                2  5       
                           2  3                3  6       
                           2  3                4  9

Like a conventional Cartesian product, a cross join causes a note regarding Cartesian products in the SAS log.

Including All Rows with the Union Join

A union join combines two tables without attempting to match rows. All columns and rows from both tables are included. Combining tables with a union join is similar to combining them with the OUTER UNION set operator (see Combining Queries with Set Operators). A union join's output can be limited by a WHERE clause.

This example shows a union join of the same One and Two tables that were used earlier to demonstrate a cross join:

proc sql;
   select *
      from one union join two;

Union Join

                           X  Y                W  Z
                    --------------------------------------
                           .                   2  5       
                           .                   3  6       
                           .                   4  9       
                           1  2                .          
                           2  3                .

Matching Rows with a Natural Join

A natural join automatically selects columns from each table to use in determining matching rows. With a natural join, PROC SQL identifies columns in each table that have the same name and type; rows in which the values of these columns are equal are returned as matching rows. The ON clause is implied.

This example produces the same results as the example in Specifying the Order of Join Output:

proc sql outobs=6;
   title 'Oil Production/Reserves of Countries';
   select country, barrelsperday 'Production', barrels 'Reserve'
      from sql.oilprod natural join sql.oilrsrvs
      order by barrelsperday desc;

Natural Inner Join of OILPROD and OILRSRVS

                      Oil Production/Reserves of Countries

        Country                           Production            Reserve
        ---------------------------------------------------------------
        Saudi Arabia                       9,000,000    260,000,000,000
        United States of America           8,000,000     30,000,000,000
        Iran                               4,000,000     90,000,000,000
        Norway                             3,500,000     11,000,000,000
        Mexico                             3,400,000     50,000,000,000
        China                              3,000,000     25,000,000,000

The advantage of using a natural join is that the coding is streamlined. The ON clause is implied, and you do not need to use table aliases to qualify column names that are common to both tables. These two queries return the same results:

proc sql;
   select a.W, a.X, Y, Z
   from table1 a left join table2 b
   on a.W=b.W and a.X=b.X
   order by a.W;

proc sql;
   select W, X, Y, Z
   from table1 natural left join table2
   order by W;

If you specify a natural join on tables that do not have at least one column with a common name and type, then the result is a Cartesian product. You can use a WHERE clause to limit the output.

Because the natural join makes certain assumptions about what you want to accomplish, you should know your data thoroughly before using it. You could get unexpected or incorrect results if, for example, you are expecting two tables to have only one column in common when they actually have two. You can use the FEEDBACK option to see exactly how PROC SQL is implementing your query. See Using PROC SQL Options to Create and Debug Queries for more information about the FEEDBACK option.

A natural join assumes that you want to base the join on equal values of all pairs of common columns. To base the join on inequalities or other comparison operators, use standard inner or outer join syntax.

Using the Coalesce Function in Joins

As you can see from the previous examples, the nonmatching rows in outer joins contain missing values. By using the COALESCE function, you can overlay columns so that only the row from the table that contains data is listed. Recall that COALESCE takes a list of columns as its arguments and returns the first nonmissing value that it encounters.

This example adds the COALESCE function to the previous example to overlay the COUNTRIES.Capital, WORLDCITYCOORDS.City, and COUNTRIES.Name columns. COUNTRIES.Name is supplied as an argument to COALESCE because some islands do not have capitals.

proc sql outobs=10;   
   title 'Populations and/or Coordinates of World Cities';
   select coalesce(Capital, City,Name)format=$20. 'City',
          coalesce(Name, Country) format=$20. 'Country',
          Population, Latitude, Longitude
      from sql.countries full join sql.worldcitycoords
           on Capital = City and
           Name = Country;

Using COALESCE in Full Outer Join of COUNTRIES and WORLDCITYCOORDS

                 Populations and/or Coordinates of World Cities

  City                  Country               Population  Latitude  Longitude
  ---------------------------------------------------------------------------
  Channel Islands       Channel Islands           146436         .          .
  Abadan                Iran                           .        30         48
  Abu Dhabi             United Arab Emirates     2818628         .          .
  Abuja                 Nigeria                 99062003         .          .
  Acapulco              Mexico                         .        17       -100
  Accra                 Ghana                   17395511         5          0
  Adana                 Turkey                         .        37         35
  Addis Ababa           Ethiopia                59291170         9         39
  Adelaide              Australia                      .       -35        138
  Aden                  Yemen                          .        13         45

COALESCE can be used in both inner and outer joins. For more information about COALESCE, see Replacing Missing Values.

Comparing DATA Step Match-Merges with PROC SQL Joins

Many SAS users are familiar with using a DATA step to merge data sets. This section compares merges to joins. DATA step match-merges and PROC SQL joins can produce the same results. However, a significant difference between a match-merge and a join is that you do not have to sort the tables before you join them.

When All of the Values Match

When all of the values match in the BY variable and there are no duplicate BY variables, you can use an inner join to produce the same result as a match-merge. To demonstrate this result, here are two tables that have the column Flight in common. The values of Flight are the same in both tables:

FLTSUPER                    FLTDEST

Flight  Supervisor          Flight  Destination

  145   Kang                  145   Brussels
  150   Miller                150   Paris
  155   Evanko                155   Honolulu

FLTSUPER and FLTDEST are already sorted by the matching column Flight. A DATA step merge produces Merged Tables When All the Values Match.

data merged;
   merge FltSuper FltDest;
   by Flight;
run;

proc print data=merged noobs;
   title 'Table MERGED';
run;

Merged Tables When All the Values Match

                                  Table MERGED

                      Flight    Supervisor    Destination

                        145       Kang         Brussels  
                        150       Miller       Paris     
                        155       Evanko       Honolulu

With PROC SQL, presorting the data is not necessary. The following PROC SQL join gives the same result as that shown in Merged Tables When All the Values Match.

proc sql;
   title 'Table MERGED';
   select s.flight, Supervisor, Destination
      from fltsuper s, fltdest d
      where s.Flight=d.Flight;

When Only Some of the Values Match

When only some of the values match in the BY variable, you can use an outer join to produce the same result as a match-merge. To demonstrate this result, here are two tables that have the column Flight in common. The values of Flight are not the same in both tables:

FLTSUPER                    FLTDEST

Flight  Supervisor          Flight  Destination

  145   Kang                  145   Brussels
  150   Miller                150   Paris
  155   Evanko                165   Seattle
  157   Lei

A DATA step merge produces Merged Tables When Some of the Values Match:

data merged;
   merge fltsuper fltdest;
   by flight;
run;
proc print data=merged noobs;
   title 'Table MERGED';
run;

Merged Tables When Some of the Values Match

                                  Table MERGED

                      Flight    Supervisor    Destination

                        145       Kang         Brussels  
                        150       Miller       Paris     
                        155       Evanko                 
                        157       Lei                    
                        165                    Seattle

To get the same result with PROC SQL, use an outer join so that the query result will contain the nonmatching rows from the two tables. In addition, use the COALESCE function to overlay the Flight columns from both tables. The following PROC SQL join gives the same result as that shown in Merged Tables When Some of the Values Match:

proc sql;
   select coalesce(s.Flight,d.Flight) as Flight, Supervisor, Destination
      from fltsuper s full join fltdest d
           on s.Flight=d.Flight;

When the Position of the Values Is Important

When you want to merge two tables and the position of the values is important, you might need to use a DATA step merge. To demonstrate this idea, here are two tables to consider:

FLTSUPER                      FLTDEST

Flight  Supervisor           Flight  Destination

  145   Kang                  145    Brussels
  145   Ramirez               145    Edmonton
  150   Miller                150    Paris
  150   Picard                150    Madrid
  155   Evanko                165    Seattle
  157   Lei

For Flight 145, Kang matches with Brussels and Ramirez matches with Edmonton. Because the DATA step merges data based on the position of values in BY groups, the values of Supervisor and Destination match appropriately. A DATA step merge produces Match-Merge of the FLTSUPER and FLTDEST Tables:

data merged;
   merge fltsuper fltdest;
   by flight;
run;
proc print data=merged noobs;
   title 'Table MERGED';
run;

Match-Merge of the FLTSUPER and FLTDEST Tables

                                  Table MERGED

                      Flight    Supervisor    Destination

                        145      Kang          Brussels  
                        145      Ramirez       Edmonton  
                        150      Miller        Paris     
                        150      Picard        Madrid    
                        155      Evanko                  
                        157      Lei                     
                        165                    Seattle

PROC SQL does not process joins according to the position of values in BY groups. Instead, PROC SQL processes data only according to the data values. Here is the result of an inner join for FLTSUPER and FLTDEST:

proc sql;
   title 'Table JOINED';
   select *
      from fltsuper s, fltdest d
      where s.Flight=d.Flight;

PROC SQL Join of the FLTSUPER and FLTDEST Tables

                                  Table JOINED

                    Flight  Supervisor    Flight  Destination
                  -------------------------------------------
                       145  Kang             145  Brussels   
                       145  Kang             145  Edmonton   
                       145  Ramirez          145  Brussels   
                       145  Ramirez          145  Edmonton   
                       150  Miller           150  Paris      
                       150  Miller           150  Madrid     
                       150  Picard           150  Paris      
                       150  Picard           150  Madrid

PROC SQL builds the Cartesian product and then lists the rows that meet the WHERE clause condition. The WHERE clause returns two rows for each supervisor, one row for each destination. Because Flight has duplicate values and there is no other matching column, there is no way to associate Kang only with Brussels, Ramirez only with Edmonton, and so on.

For more information about DATA step match-merges, see SAS Language Reference: Dictionary.

Top of Page