Overlaying Missing Data Values

Problem

You are forming teams for a new league by analyzing the averages of bowlers when they were members of other bowling leagues. When possible you will use each bowler's most recent league average. However, if a bowler was not in a league last year, then you will use the bowler's average from the prior year.

Background Information

There are two tables, LEAGUE1 and LEAGUE2, that contain bowling averages for last year and the prior year respectively. The structure of the tables is not identical because the data was compiled by two different secretaries. However, the tables do contain essentially the same type of data.
data league1;
input @1 Fullname $20. @21 Bowler $4. @29 AvgScore 3.;
cards;
Alexander Delarge   4224    164
John T Chance       4425
Jack T Colton       4264
                    1412    141
Andrew Shepherd     4189    185
;
data league2;
input @1 FirstName $10. @12 LastName $15. @28 AMFNo $4. @38 AvgScore 3.;
cards;
Alex       Delarge         4224      156
Mickey     Raymond         1412
                           4264      174
Jack       Chance          4425
Patrick    O'Malley        4118      164
;
proc sql;
title 'Bowling Averages from League1';
select * from league1;
title 'Bowling Averages from League2';
select * from league2;
Sample Input Tables for Overlaying Missing Values
Bowling Averages from League1 and League2

Solution

The following PROC SQL code combines the information from two tables, LEAGUE1 and LEAGUE2. The program uses all the values from the LEAGUE1 table, if available, and replaces any missing values with the corresponding values from the LEAGUE2 table. The results are shown in the following output.
proc sql;
   title "Averages from Last Year's League When Possible";
   title2 "Supplemented when Available from Prior Year's League";
   select coalesce(lastyr.fullname,trim(prioryr.firstname)
                   ||' '||prioryr.lastname)as Name format=$26.,
          coalesce(lastyr.bowler,prioryr.amfno)as Bowler,
          coalesce(lastyr.avgscore,prioryr.avgscore)as Average format=8.
      from league1 as lastyr full join league2 as prioryr
           on lastyr.bowler=prioryr.amfno
      order by Bowler;
PROC SQL Output for Overlaying Missing Values
Averages from Last Year's League When Possible

How It Works

This solution uses a full join to obtain all rows from LEAGUE1 as well as all rows from LEAGUE2. The program uses the COALESCE function on each column so that, whenever possible, there is a value for each column of a row. Using the COALESCE function on a list of expressions that is enclosed in parentheses returns the first nonmissing value that is found. For each row, the following code returns the AvgScore column from LEAGUE1 for Average:
coalesce(lastyr.avgscore,prioryr.avgscore) as Average format=8.  
If this value of AvgScore is missing, then COALESCE returns the AvgScore column from LEAGUE2 for Average. If this value of AvgScore is missing, then COALESCE returns a missing value for Average.
In the case of the Name column, the COALESCE function returns the value of FullName from LEAGUE1 if it exists. If not, then the value is obtained from LEAGUE2 by using both the TRIM function and concatenation operators to combine the first name and last name columns:
trim(prioryr.firstname)||' '||prioryr.lastname
Finally, the table is ordered by Bowler. The Bowler column is the result of the COALESCE function.
coalesce(lastyr.bowler,prioryr.amfno)as Bowler
Because the value is obtained from either table, you cannot confidently order the output by either the value of Bowler in LEAGUE1 or the value of AMFNo in LEAGUE 2, but only by the value that results from the COALESCE function.