|Query and Reporting|
You often need data that are stored in separate tables. For example, you may want to produce a report that gets information about flight delays from one table and boarding capacity from another. Information from separate tables can be combined by matching values in columns that relate the tables. This is called joining tables. You can join a maximum of 32 tables (including views) in one query. Note that some limitations apply when you join DB2 tables. For more information, see Additional Considerations for Joining DB2 Tables.
|Introducing Table Joins|
A join combines two or more tables side by side. If you do not specify how to join the tables, you get a Cartesian product. This means that SQL combines each row from the first table with every row from the second table, the second row of the first table is then matched with every row of the second table, and so on. Therefore, two tables with 100 rows each produce 100 * 100, or 10,000, rows. You seldom want all this information, and it often causes you to exceed the available disk space.
To avoid a Cartesian product, you must specify how the tables should be combined. Typically, you want to pair rows based on matching values in one or more key columns of each table. The following example uses the tables in the following figure to illustrate a join.
Joining Two Tables
Output from Joined Tables
You can also join tables that have missing values. For example, ID 3 is missing from Table B. A left join outputs all the names in Table A (the left table) and matches them to the available salary information in Table B. If a row in Table B is missing, missing values are supplied in the joined output. In this case, Michael from Table A is output with a missing value for SALARY next to his name. The following figure shows the results of this query.
Output from Left Joined Tables
A right join works similarly. All the rows from Table B (the right table) are output, and any missing rows from table A are output as missing values (or blanks, if the values are character data). The following figure shows the results of this query.
Output from Right Joined Tables
Output from Fully Joined Tables
|Joining Tables Manually|
If tables selected for joining have a relation defined between them, Query and Reporting joins them automatically; see Joining Tables Automatically for more information. If the tables do not have a relation defined between them, or if they have more than one such relation defined, you have to join the tables manually. The following sections show examples of how to join tables manually in both cases, as well as an example of performing outer joins, which must be done manually.
The Incomplete Inner join message appears under Join. The warning Not all tables are joined appears because the tables do not contain a relation defined on them. Therefore, Query and Reporting has not automatically joined the tables. The following display shows the Query window with the two tables to be joined.
Note: To clear previously-defined subsets or expressions, select subset or Expression. Select Clear from the Edit menu, then select Close from the File menu.
Query Window with Tables for Manual Join
If you run a query with an incomplete join, you create a Cartesian product, as described in Introducing Table Joins. Depending on the setup at your site, either the query is rejected or a WARNING window appears, verifying whether or not you really want to run this query. If you select Yes to run the query, your query may take a long time to process, depending on the size of the tables.
Note: If you are joining more than five tables, not all of the table names appear in the Join window; overwrite the first number in the Tables field to see more tables.
The following display shows the Join Manager window with AIRLINE.JOBCODES and AIRLINE.PAYROLL before they are joined. The Group field indicates whether groups of tables are joined. 1,2 indicates that this group is not joined.
Type 1 next to A.JOBCODE and 1 next to B.JOBCODE and press ENTER. Note that the column names for the two tables are the same in this example. The column names do not have to be the same, but their data types and values must match. You can select Sort by from the Edit menu to reorder columns in the tables by their names, join number, index, or matching names.
After you press ENTER and the tables are joined, the highlighting of the table names is removed, the Group field changes to 1,1 (indicating one group), and a message appears saying All tables are joined. The following display shows the Join Manager window after the tables have been joined.
Join Manager Window With Two Joined Tables
The table names are displayed at the top of the Join Manager window. AIRLINE.SCHEDULE is highlighted because it is not yet joined with another table. AIRLINE.SCHEDULE has no table alias under the JOINS/relations field for the same reason. This table is also listed as a separate group in the Group field.
Join Manager Window With Third Table Before Joining
Because the columns from only two tables can be listed at the same time in this window, the Alias fields are blank and no columns are listed. AIRLINE.PAYROLL and AIRLINE.SCHEDULE have a common column, IDNUM. Type the alias B in the first Alias field and the alias C in the second Alias field. When you press ENTER, the tables' columns are listed. Type 1 next to B.IDNUM and C.IDNUM, respectively, and press ENTER.
When the three tables are joined, the highlighting is removed, the Group field changes to 1,1,1 (indicating one group), and a message appears saying All tables are joined, as shown in the following display.
Join Manager Table With Three Joined Tables
Select Close from the File menu to return to the Query window. Save or run your query. After you run your query, you can save your output to a table. See Creating a New SAS Table or View for more information.
When you join tables, Query and Reporting uses an inner join by default. However, you can choose another type of join if you are joining only two tables. In DB2, outer joins are allowed on only two tables. Note that there may be other restrictions under DB2; for further information, see Additional Considerations for Joining DB2 Tables. Outer joins must be performed manually.
Type 1 in the A.IDNUM column and 1 in the B.SUPID column and press ENTER. (These steps are described earlier in Joining Tables Without Defined Relations.)
Show SQL Window for Right Join
Exit the Show SQL window.
Select Submit from the Run menu to run the query. The following display shows the results. This query creates a list of supervisors who have been assigned to flights as well as supervisors who have not been assigned to flights.
Results of Right Join
|Defining Relations on Tables|
When you query SAS tables, relations provide a way to identify rows and to relate (or join) rows in one table to rows in another table. You define a relation on one or more columns in a table. Relations are based on the database concept of primary and foreign keys; see SAS/ACCESS for Relational Databases: Reference for more information.
You define a relation on tables based on one or more columns that tables contain. For example, the SAS AIRLINE.PAYROLL table has a column named IDNUM that has the same attributes as the IDNUM column in AIRLINE.STAFF. Therefore, you would define and name a relation for IDNUM on the two tables.
Defining a relation between or among tables enables you to build intelligence into your query creation process. When one relation is defined on tables that you join in a query, Query and Reporting automatically generates a WHERE or ON clause for that query. This saves you from defining a WHERE predicate for each join query. (See Joining Tables Automatically for more information.) You can, of course, still select Subset from the Query window to add other predicates to your WHERE clause; see Retrieving a Subset of Data for more information.
If no relations are defined or if more than one relation is defined for the tables in the join, you must join the tables manually, as described in Joining Tables Manually and Joining Tables with Multiple Relations.
Relations for Current Tables Window
Repeat steps 2 through 9 above to define a relation on the AIRLINE.DEST and AIRLINE.FLINFO tables. Define your relation on the A.DEST and B.DEST columns. Name this relation FLDEST and give it a description of Destination Name Join . Then, repeat steps 5 through 9 to define a second relation on the AIRLINE.DEST and AIRLINE.FLINFO tables. By typing 2 (instead of 1 ) next to the columns, define this relation on the A.DEST and B.ORIG columns. Name this relation FLORIG and give it a description of Origin Name Join .
In order to add, edit, or delete a relation, you must have the correct permissions to do so. Permission to modify relations is determined by the Query Manager in effect. Contact your SAS Support Consultant for details.
|Joining Tables Automatically|
The Query window can join tables automatically if a relation is defined on the tables. In addition, as already discussed, relations can be defined either by a DBA or by a user. (Note that for DB2, if any referential constraints are defined, these constraints are used by the Query Manager to join tables automatically.)
Query and Reporting automatically joins tables that have exactly one relation defined on them. The default action is to create an inner join. (See Joining Tables Manually for information on joining tables without a defined relation and for performing outer joins.)
By convention, relations are a property of the second of the two tables joined, in this case, AIRLINE.MARCH. To see this property, type I next to the table name in the Select window. The Table Information window appears, as shown in the following display.
Table Information Window
At this point, you can customize your query, run it, and save it as described in Saving a Query. Alternatively, you can save the query's output to a table, as described in Creating New Tables. If you intend to access the joined data frequently, it is often more efficient to save the output to a SAS table and then specify the new SAS table in your SAS code.
|Joining Tables with Multiple Relations|
Tables with multiple relations must be joined manually because you have to decide which columns to use to join the tables. This example uses the relations you defined on the AIRLINE.DEST and AIRLINE.FLINFO tables in Defining Relations on Tables.
Select Join to open the Join Manager window. The Join Manager window appears much as it does in Join Manager Window With Two Tables To Be Joined. The table aliases under the JOINS/relations field are in lowercase letters. The aliases are lowercase to indicate that a relation exists between the tables but that it is not used yet. The following display shows the tables before they are joined.
Join Manager Window With Multiple Relations
Choose Select Relation from the Edit menu, or select one of the lowercase aliases under the JOINS/relations field to select a relation with which to join. The relations for your tables are listed. Select Destination Name Join. The tables are joined and the lowercase aliases in JOINS/relations change to uppercase. Notice in the following display that a 1 automatically appeared in the A.DEST and B.DEST fields and the Group field changes to 1,1.
Join Manager Window With Multiple Relations After Joining
|Additional Considerations for Joining DB2 Tables|
When you use the Query window, you can perform inner joins or outer joins (left, right, and full) on DB2 tables. If one primary key/foreign key relation is defined between the tables, Query and Reporting joins the related tables automatically. It generates a WHERE or ON clause that is based on the related column(s) in the tables, such as where payroll.idnum=staff.idnum .
If no primary key/foreign key relation is defined or if more than one relation exists between the contributing tables, you must specify manually which columns are used to join the tables (see Joining Tables Manually).
Versions 4 and later of DB2 includes outer join operators. Therefore, when you use the Query window and specify a left, right, or full join, Query and Reporting generates the WHERE or ON clause automatically.
DB2 versions prior to Version 4 lack outer join operators. You can still specify an outer join type in Query and Reporting. If relations are defined on the tables, a WHERE or ON clause is automatically generated. If no relations are defined, you have to join the tables manually. When you review your SQL code (by selecting Show SQL from the View menu), you see that DB2 SQL has used set operators (such as UNION) to produce an outer join result, instead of using outer join operators.
Because you generate a WHERE or ON clause when you join the tables, you cannot specify additional WHERE predicates or expressions in the SELECT clause. For example, if you select Subset in the Query window, type a WHERE predicate, and exit the window, the Messages window appears with Subset is disabled due to outer join. Then Disabled appears under Subset. It works the same way with Expression.
|Data Type||Default Value|
|Character data||blank (' ')|
|Date||January 1, 0001|
|Timestamp||January 1, 0001, midnight|
These referential constraints are established by defining primary keys and foreign keys for the columns in the relationship. A primary key defines a unique identifier for the rows of a table. A foreign key is a column where each value contains a value in the primary key of another defined table. After the relationship between a primary key and a foreign key has been established, DB2 enforces the referential integrity for you according to both the rules that DB2 has established and the referential constraints that you have specified.
If referential constraints have been established, the Select window uses them to generate automatic joins between tables. In the Query, Select, or Report Engine windows, you can type the I command next to a table or column name to open the Select Table information or Column information window. These windows display information on referential constraints for your convenience.