Previous Page | Next Page

Practical Problem-Solving with PROC SQL

Expanding Hierarchical Data in a Table


Problem

You want to generate an output column that shows a hierarchical relationship among rows in a table.


Background Information

There is one input table, called EMPLOYEES, that contains the following data:

Sample Input Table for Expanding a Hierarchy

                     Sample Data for Expanding a Hierarchy

                                          First
               Obs     ID     LastName    Name        Supervisor

                1     1001    Smith       John           1002   
                2     1002    Johnson     Mary           None   
                3     1003    Reed        Sam            None   
                4     1004    Davis       Karen          1003   
                5     1005    Thompson    Jennifer       1002   
                6     1006    Peterson    George         1002   
                7     1007    Jones       Sue            1003   
                8     1008    Murphy      Janice         1003   
                9     1009    Garcia      Joe            1002   

You want to create output that shows the full name and ID number of each employee who has a supervisor, along with the full name and ID number of that employee's supervisor.


Solution

Use the following PROC SQL code to expand the data:

proc sql;
   title 'Expanded Employee and Supervisor Data';
   select A.ID label="Employee ID",
          trim(A.FirstName)||' '||A.LastName label="Employee Name",
          B.ID label="Supervisor ID",
          trim(B.FirstName)||' '||B.LastName label="Supervisor Name"
      from Employees A, Employees B
      where A.Supervisor=B.ID and A.Supervisor is not missing;

PROC SQL Output for Expanding a Hierarchy

                     Expanded Employee and Supervisor Data

           Employee                     Supervisor
           ID        Employee Name      ID          Supervisor Name
           ----------------------------------------------------------
           1001      John Smith         1002        Mary Johnson     
           1005      Jennifer Thompson  1002        Mary Johnson     
           1006      George Peterson    1002        Mary Johnson     
           1009      Joe Garcia         1002        Mary Johnson     
           1004      Karen Davis        1003        Sam Reed         
           1007      Sue Jones          1003        Sam Reed         
           1008      Janice Murphy      1003        Sam Reed         

How It Works

This solution uses a self-join (reflexive join) to match employees and their supervisors. The SELECT clause assigns aliases of A and B to two instances of the same table and retrieves data from each instance. From instance A, the SELECT clause

From instance B, the SELECT clause

In both concatenations, the SELECT clause uses the TRIM function to remove trailing spaces from the data in the FirstName column, and then concatenates the data with a single space and the data in the LastName column to produce a single character value for each full name.
trim(A.FirstName)||' '||A.LastName label="Employee Name"

When PROC SQL applies the WHERE clause, the two table instances are joined. The WHERE clause conditions restrict the output to only those rows in table A that have a supervisor ID that matches an employee ID in table B. This operation provides a supervisor ID and full name for each employee in the original table, except for those who do not have a supervisor.

 where A.Supervisor=B.ID and A.Supervisor is not missing;

Note:   Although there are no missing values in the Employees table, you should check for and exclude missing values from your results to avoid unexpected results. For example, if there were an employee with a blank supervisor ID number and an employee with a blank ID, then they would produce an erroneous match in the results.  [cautionend]

Previous Page | Next Page | Top of Page