Practical Problem-Solving with PROC SQL |
Problem |
You are forming teams for a new league by analyzing the averages of bowlers when they were members of other bowling leagues. When possible you will use each bowler's most recent league average. However, if a bowler was not in a league last year, then you will use the bowler's average from the prior year.
Background Information |
There are two tables, LEAGUE1 and LEAGUE2, that contain bowling averages for last year and the prior year respectively. The structure of the tables is not identical because the data was compiled by two different secretaries. However, the tables do contain essentially the same type of data.
Sample Input Tables for Overlaying Missing Values
Bowling Averages from League1 Fullname Bowler AvgScore -------------------------------------- Alexander Delarge 4224 164 John T Chance 4425 . Jack T Colton 4264 . 1412 141 Andrew Shepherd 4189 185
Bowling Averages from League2 FirstName LastName AMFNo AvgScore -------------------------------------------- Alex Delarge 4224 156 Mickey Raymond 1412 . 4264 174 Jack Chance 4425 . Patrick O'Malley 4118 164
Solution |
The following PROC SQL code combines the information from two tables, LEAGUE1 and LEAGUE2. The program uses all the values from the LEAGUE1 table, if available, and replaces any missing values with the corresponding values from the LEAGUE2 table. The results are shown in the following output.
options nodate nonumber linesize=80 pagesize=60; proc sql; title "Averages from Last Year's League When Possible"; title2 "Supplemented when Available from Prior Year's League"; select coalesce(lastyr.fullname,trim(prioryr.firstname) ||' '||prioryr.lastname)as Name format=$26., coalesce(lastyr.bowler,prioryr.amfno)as Bowler, coalesce(lastyr.avgscore,prioryr.avgscore)as Average format=8. from league1 as lastyr full join league2 as prioryr on lastyr.bowler=prioryr.amfno order by Bowler;
PROC SQL Output for Overlaying Missing Values
Averages from Last Year's League When Possible Supplemented when Available from Prior Year's League Name Bowler Average -------------------------------------------- Mickey Raymond 1412 141 Patrick O'Malley 4118 164 Andrew Shepherd 4189 185 Alexander Delarge 4224 164 Jack T Colton 4264 174 John T Chance 4425 .
How It Works |
This solution uses a full join to obtain all rows from LEAGUE1 as well as all rows from LEAGUE2. The program uses the COALESCE function on each column so that, whenever possible, there is a value for each column of a row. Using the COALESCE function on a list of expressions that is enclosed in parentheses returns the first nonmissing value that is found. For each row, the following code returns the AvgScore column from LEAGUE1 for Average:
coalesce(lastyr.avgscore,prioryr.avgscore) as Average format=8.
If this value of AvgScore is missing, then COALESCE returns the AvgScore column from LEAGUE2 for Average. If this value of AvgScore is missing, then COALESCE returns a missing value for Average.
In the case of the Name column, the COALESCE function returns the value of FullName from LEAGUE1 if it exists. If not, then the value is obtained from LEAGUE2 by using both the TRIM function and concatenation operators to combine the first name and last name columns:
trim(prioryr.firstname)||' '||prioryr.lastname
Finally, the table is ordered by Bowler. The Bowler column is the result of the COALESCE function.
coalesce(lastyr.bowler,prioryr.amfno)as Bowler
Because the value is obtained from either table, you cannot confidently order the output by either the value of Bowler in LEAGUE1 or the value of AMFNo in LEAGUE 2, but only by the value that results from the COALESCE function.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.