Practical Problem-Solving with PROC SQL |
Problem |
You want to generate an output column that shows a hierarchical relationship among rows in a table.
Background Information |
There is one input table, called EMPLOYEES, that contains the following data:
Sample Input Table for Expanding a Hierarchy
Sample Data for Expanding a Hierarchy First Obs ID LastName Name Supervisor 1 1001 Smith John 1002 2 1002 Johnson Mary None 3 1003 Reed Sam None 4 1004 Davis Karen 1003 5 1005 Thompson Jennifer 1002 6 1006 Peterson George 1002 7 1007 Jones Sue 1003 8 1008 Murphy Janice 1003 9 1009 Garcia Joe 1002
You want to create output that shows the full name and ID number of each employee who has a supervisor, along with the full name and ID number of that employee's supervisor.
Solution |
Use the following PROC SQL code to expand the data:
proc sql; title 'Expanded Employee and Supervisor Data'; select A.ID label="Employee ID", trim(A.FirstName)||' '||A.LastName label="Employee Name", B.ID label="Supervisor ID", trim(B.FirstName)||' '||B.LastName label="Supervisor Name" from Employees A, Employees B where A.Supervisor=B.ID and A.Supervisor is not missing;
PROC SQL Output for Expanding a Hierarchy
Expanded Employee and Supervisor Data Employee Supervisor ID Employee Name ID Supervisor Name ---------------------------------------------------------- 1001 John Smith 1002 Mary Johnson 1005 Jennifer Thompson 1002 Mary Johnson 1006 George Peterson 1002 Mary Johnson 1009 Joe Garcia 1002 Mary Johnson 1004 Karen Davis 1003 Sam Reed 1007 Sue Jones 1003 Sam Reed 1008 Janice Murphy 1003 Sam Reed
How It Works |
This solution uses a self-join (reflexive join) to match employees and their supervisors. The SELECT clause assigns aliases of A and B to two instances of the same table and retrieves data from each instance. From instance A, the SELECT clause
selects the ID column and assigns it a label of Employee ID
selects and concatenates the FirstName and LastName columns into one output column and assigns it a label of Employee Name.
From instance B, the SELECT clause
selects the ID column and assigns it a label of Supervisor ID
selects and concatenates the FirstName and LastName columns into one output column and assigns it a label of Supervisor Name.
trim(A.FirstName)||' '||A.LastName label="Employee Name"
When PROC SQL applies the WHERE clause, the two table instances are joined. The WHERE clause conditions restrict the output to only those rows in table A that have a supervisor ID that matches an employee ID in table B. This operation provides a supervisor ID and full name for each employee in the original table, except for those who do not have a supervisor.
where A.Supervisor=B.ID and A.Supervisor is not missing;
Note: Although there are no missing values in the Employees table, you should check for and exclude missing values from your results to avoid unexpected results. For example, if there were an employee with a blank supervisor ID number and an employee with a blank ID, then they would produce an erroneous match in the results.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.