Query and Reporting |
You often need data that are stored in separate tables. For example, you may want to produce a report that gets information about flight delays from one table and boarding capacity from another. Information from separate tables can be combined by matching values in columns that relate the tables. This is called joining tables. You can join a maximum of 32 tables (including views) in one query. Note that some limitations apply when you join DB2 tables. For more information, see Additional Considerations for Joining DB2 Tables.
Introducing Table Joins |
A join combines two or more tables side by side. If you do not specify how to join the tables, you get a Cartesian product. This means that SQL combines each row from the first table with every row from the second table, the second row of the first table is then matched with every row of the second table, and so on. Therefore, two tables with 100 rows each produce 100 * 100, or 10,000, rows. You seldom want all this information, and it often causes you to exceed the available disk space.
To avoid a Cartesian product, you must specify how the tables should be combined. Typically, you want to pair rows based on matching values in one or more key columns of each table. The following example uses the tables in the following figure to illustrate a join.
Joining Two Tables
Pair the SALARY information in Table B with the NAME of the person in Table A by matching the values in the ID columns. Therefore, your join criterion is A.ID=B.ID. In SQL, the code is:
select name, salary from a, b where a.id = b.id;
Only rows with matching values of ID are output. This is known as an inner join. The following figure shows the results of this query.
Output from Joined Tables
You can also join tables that have missing values. For example, ID 3 is missing from Table B. A left join outputs all the names in Table A (the left table) and matches them to the available salary information in Table B. If a row in Table B is missing, missing values are supplied in the joined output. In this case, Michael from Table A is output with a missing value for SALARY next to his name. The following figure shows the results of this query.
Output from Left Joined Tables
A right join works similarly. All the rows from Table B (the right table) are output, and any missing rows from table A are output as missing values (or blanks, if the values are character data). The following figure shows the results of this query.
Output from Right Joined Tables
If you join two tables that both have missing rows, you can combine the effect of a left and a right join by using a full join. The following figure shows the results of this query.
Output from Fully Joined Tables
Inner joins can be performed on up to 32 tables or views at a time, whereas outer joins (left, right, full) work on only two tables at a time.
Joining Tables Manually |
If tables selected for joining have a relation defined between them, Query and Reporting joins them automatically; see Joining Tables Automatically for more information. If the tables do not have a relation defined between them, or if they have more than one such relation defined, you have to join the tables manually. The following sections show examples of how to join tables manually in both cases, as well as an example of performing outer joins, which must be done manually.
Follow these instructions to join two tables manually:
Select Close from the File menu to return to the Query window.
The Incomplete Inner join message appears under Join. The warning Not all tables are joined appears because the tables do not contain a relation defined on them. Therefore, Query and Reporting has not automatically joined the tables. The following display shows the Query window with the two tables to be joined.
Note: To clear previously-defined subsets or expressions, select subset or Expression. Select Clear from the Edit menu, then select Close from the File menu.
Query Window with Tables for Manual Join
If you run a query with an incomplete join, you create a Cartesian product, as described in Introducing Table Joins. Depending on the setup at your site, either the query is rejected or a WARNING window appears, verifying whether or not you really want to run this query. If you select Yes to run the query, your query may take a long time to process, depending on the size of the tables.
Select Join to open the Join Manager window. The Join Manager window shows the selected tables and all of their columns. Before the tables are joined, their names are highlighted.
Note: If you are joining more than five tables, not all of the table names appear in the Join window; overwrite the first number in the Tables field to see more tables.
The following display shows the Join Manager window with AIRLINE.JOBCODES and AIRLINE.PAYROLL before they are joined. The Group field indicates whether groups of tables are joined. 1,2 indicates that this group is not joined.
Type 1 next to A.JOBCODE and 1 next to B.JOBCODE and press ENTER. Note that the column names for the two tables are the same in this example. The column names do not have to be the same, but their data types and values must match. You can select Sort by from the Edit menu to reorder columns in the tables by their names, join number, index, or matching names.
After you press ENTER and the tables are joined, the highlighting of the table names is removed, the Group field changes to 1,1 (indicating one group), and a message appears saying All tables are joined. The following display shows the Join Manager window after the tables have been joined.
Join Manager Window With Two Joined Tables
Select Close from the File menu to return to the Query window. Note that the message under the Join button says Complete inner join.
To add a third table to the joined AIRLINE.JOBCODES and AIRLINE.PAYROLL tables, follow these instructions:
Return to the Select window by choosing Select from the Query window.
Select Close from the File menu to return to the Query window. The message Incomplete Inner join appears again.
Select Join to open the Join Manager window.
The table names are displayed at the top of the Join Manager window. AIRLINE.SCHEDULE is highlighted because it is not yet joined with another table. AIRLINE.SCHEDULE has no table alias under the JOINS/relations field for the same reason. This table is also listed as a separate group in the Group field.
The following display shows the Join Manager window before the third table AIRLINE.SCHEDULE is joined with AIRLINE.JOBCODES and AIRLINE.PAYROLL.
Join Manager Window With Third Table Before Joining
Because the columns from only two tables can be listed at the same time in this window, the Alias fields are blank and no columns are listed. AIRLINE.PAYROLL and AIRLINE.SCHEDULE have a common column, IDNUM. Type the alias B in the first Alias field and the alias C in the second Alias field. When you press ENTER, the tables' columns are listed. Type 1 next to B.IDNUM and C.IDNUM, respectively, and press ENTER.
When the three tables are joined, the highlighting is removed, the Group field changes to 1,1,1 (indicating one group), and a message appears saying All tables are joined, as shown in the following display.
Join Manager Table With Three Joined Tables
Select Close from the File menu to return to the Query window. Save or run your query. After you run your query, you can save your output to a table. See Creating a New SAS Table or View for more information.
When you join tables, Query and Reporting uses an inner join by default. However, you can choose another type of join if you are joining only two tables. In DB2, outer joins are allowed on only two tables. Note that there may be other restrictions under DB2; for further information, see Additional Considerations for Joining DB2 Tables. Outer joins must be performed manually.
If desired, save the previous query using Save from the File menu.
Open the Select window and choose the AIRLINE.SCHEDULE and AIRLINE.SUPERV tables. These tables do not have a relation defined on them.
Return to the Query window, and note that the Incomplete Inner Join message appears under Join.
To change this query to a right join, select Join Type from the Edit menu.
Select Right join from the list of join types and Goback. When you return to the Query window, the Incomplete Right join message appears under the Join button.
Type 1 in the A.IDNUM column and 1 in the B.SUPID column and press ENTER. (These steps are described earlier in Joining Tables Without Defined Relations.)
If desired, reorder the columns of the query and then select Show SQL from the View menu. The following display shows the second display of the SQL for the right join query.
Show SQL Window for Right Join
Exit the Show SQL window.Select Submit from the Run menu to run the query. The following display shows the results. This query creates a list of supervisors who have been assigned to flights as well as supervisors who have not been assigned to flights.
Results of Right Join
Defining Relations on Tables |
When you query SAS tables, relations provide a way to identify rows and to relate (or join) rows in one table to rows in another table. You define a relation on one or more columns in a table. Relations are based on the database concept of primary and foreign keys; see SAS/ACCESS for Relational Databases: Reference for more information.
You define a relation on tables based on one or more columns that tables contain. For example, the SAS AIRLINE.PAYROLL table has a column named IDNUM that has the same attributes as the IDNUM column in AIRLINE.STAFF. Therefore, you would define and name a relation for IDNUM on the two tables.
Defining a relation between or among tables enables you to build intelligence into your query creation process. When one relation is defined on tables that you join in a query, Query and Reporting automatically generates a WHERE or ON clause for that query. This saves you from defining a WHERE predicate for each join query. (See Joining Tables Automatically for more information.) You can, of course, still select Subset from the Query window to add other predicates to your WHERE clause; see Retrieving a Subset of Data for more information.
If no relations are defined or if more than one relation is defined for the tables in the join, you must join the tables manually, as described in Joining Tables Manually and Joining Tables with Multiple Relations.
The examples in this section are used to demonstrate joining tables automatically and joining tables with multiple relations later in this chapter.
If desired, save the previous query by selecting Save from the File menu in the Query window.
Go to the Select window and select the AIRLINE.MARCH and AIRLINE.FLINFO tables. Return to the Query window.
Type a 1 next to the A.FLIGHT and B.FLIGHT columns and press ENTER. A message indicates that the tables are joined.
Follow this selection path to open the Relations for Current Tables window:View Relations Current Tables
Relations for Current Tables Window
In the Relation name field, type MARFLT . A relation name can be up to eight characters long.
In the Description field, type March - Flight Info Relation . A description can be up to 32 characters long.
Select Save to save the relation, then Goback to return to the Join Manager window.
Repeat steps 2 through 9 above to define a relation on the AIRLINE.DEST and AIRLINE.FLINFO tables. Define your relation on the A.DEST and B.DEST columns. Name this relation FLDEST and give it a description of Destination Name Join . Then, repeat steps 5 through 9 to define a second relation on the AIRLINE.DEST and AIRLINE.FLINFO tables. By typing 2 (instead of 1 ) next to the columns, define this relation on the A.DEST and B.ORIG columns. Name this relation FLORIG and give it a description of Origin Name Join .
In order to add, edit, or delete a relation, you must have the correct permissions to do so. Permission to modify relations is determined by the Query Manager in effect. Contact your SAS Support Consultant for details.
To modify a relation, follow these instructions:
Select the tables associated with that relation in the Select window
If you want only to change the name or description of the relation, proceed to the next step. Otherwise, modify the join as desired.
Follow this selection path:View Relations Current TablesThe Relations for Current Tables window appears, as shown in the previous display.
Type the name of the relation in the Relation name field. You can also type a ? in the Relation name field and select the relation from the resulting list.
Select Save. When prompted, select Yes to overwrite the existing relation.
To delete a relation, follow these instuctions:
Select the tables associated with that relation in the Select window.
Follow this selection path:View Relations Current TablesThe Relations for Current Tables window appears, as shown in the previous display.
Type the name of the relation in the Relation name field. You can also type a ? in the Relation name field and select the relation from the resulting list.
Select Delete, then Goback to return to the Join Manager window.
Joining Tables Automatically |
The Query window can join tables automatically if a relation is defined on the tables. In addition, as already discussed, relations can be defined either by a DBA or by a user. (Note that for DB2, if any referential constraints are defined, these constraints are used by the Query Manager to join tables automatically.)
Query and Reporting automatically joins tables that have exactly one relation defined on them. The default action is to create an inner join. (See Joining Tables Manually for information on joining tables without a defined relation and for performing outer joins.)
In this section you join the AIRLINE.MARCH and AIRLINE.FLINFO tables on which you defined a relation in the previous section.
Reset the Query window by selecting New from the File window.
Open the Select window and select the AIRLINE.MARCH and AIRLINE.FLINFO tables. Return to the Query window.
By convention, relations are a property of the second of the two tables joined, in this case, AIRLINE.MARCH. To see this property, type I next to the table name in the Select window. The Table Information window appears, as shown in the following display.
Table Information Window
Select Close from the File menu twice to return to the Query window. The message All tables are joined appears in the Query window, and the message Complete Inner join appears under the Join button.
At this point, you can customize your query, run it, and save it as described in Saving a Query. Alternatively, you can save the query's output to a table, as described in Creating New Tables. If you intend to access the joined data frequently, it is often more efficient to save the output to a SAS table and then specify the new SAS table in your SAS code.
Joining Tables with Multiple Relations |
Tables with multiple relations must be joined manually because you have to decide which columns to use to join the tables. This example uses the relations you defined on the AIRLINE.DEST and AIRLINE.FLINFO tables in Defining Relations on Tables.
Open the Select window and select AIRLINE.DEST and AIRLINE.FLINFO.
Open the Table Information window for the AIRLINE.FLINFO table to see the two relations defined on it.
Exit the Table information window and the Select window to return to the Query window. The Not all tables are joined message is displayed because you have not yet selected a relation.
Select Join to open the Join Manager window. The Join Manager window appears much as it does in Join Manager Window With Two Tables To Be Joined. The table aliases under the JOINS/relations field are in lowercase letters. The aliases are lowercase to indicate that a relation exists between the tables but that it is not used yet. The following display shows the tables before they are joined.
Join Manager Window With Multiple Relations
Choose Select Relation from the Edit menu, or select one of the lowercase aliases under the JOINS/relations field to select a relation with which to join. The relations for your tables are listed. Select Destination Name Join. The tables are joined and the lowercase aliases in JOINS/relations change to uppercase. Notice in the following display that a 1 automatically appeared in the A.DEST and B.DEST fields and the Group field changes to 1,1.
Join Manager Window With Multiple Relations After Joining
Return to the Query window to customize, save, or run your query.
Additional Considerations for Joining DB2 Tables |
When you use the Query window, you can perform inner joins or outer joins (left, right, and full) on DB2 tables. If one primary key/foreign key relation is defined between the tables, Query and Reporting joins the related tables automatically. It generates a WHERE or ON clause that is based on the related column(s) in the tables, such as where payroll.idnum=staff.idnum .
If no primary key/foreign key relation is defined or if more than one relation exists between the contributing tables, you must specify manually which columns are used to join the tables (see Joining Tables Manually).
Versions 4 and later of DB2 includes outer join operators. Therefore, when you use the Query window and specify a left, right, or full join, Query and Reporting generates the WHERE or ON clause automatically.
DB2 versions prior to Version 4 lack outer join operators. You can still specify an outer join type in Query and Reporting. If relations are defined on the tables, a WHERE or ON clause is automatically generated. If no relations are defined, you have to join the tables manually. When you review your SQL code (by selecting Show SQL from the View menu), you see that DB2 SQL has used set operators (such as UNION) to produce an outer join result, instead of using outer join operators.
Because earlier versions of DB2 SQL do not have outer join operators, certain limitations apply to outer joins in Query and Reporting:
Because you generate a WHERE or ON clause when you join the tables, you cannot specify additional WHERE predicates or expressions in the SELECT clause. For example, if you select Subset in the Query window, type a WHERE predicate, and exit the window, the Messages window appears with Subset is disabled due to outer join. Then Disabled appears under Subset. It works the same way with Expression.
Arithmetic expressions (such as those in a SELECT clause) are ignored.
Results from outer joins cannot be stored in a new DB2 table. (However, you can save the results to a SAS table.)
The following table shows the default values in outer joins.
Data Type | Default Value |
---|---|
Character data | blank (' ') |
Numeric Data | 0 |
Date | January 1, 0001 |
Time | Midnight |
Timestamp | January 1, 0001, midnight |
Referential integrity is implemented in the DB2 database management systems by specifying conditions where all references from one database column to another column must be validated.
These referential constraints are established by defining primary keys and foreign keys for the columns in the relationship. A primary key defines a unique identifier for the rows of a table. A foreign key is a column where each value contains a value in the primary key of another defined table. After the relationship between a primary key and a foreign key has been established, DB2 enforces the referential integrity for you according to both the rules that DB2 has established and the referential constraints that you have specified.
If referential constraints have been established, the Select window uses them to generate automatic joins between tables. In the Query, Select, or Report Engine windows, you can type the I command next to a table or column name to open the Select Table information or Column information window. These windows display information on referential constraints for your convenience.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.