Computing a Weighted Average

Problem

You want to compute a weighted average of a column of values.

Background Information

There is one input table, called Sample, that contains the following data:
data Sample;
   do i=1 to 10;
      Value=2983*ranuni(135);
      Weight=33*rannor(579);
      if mod(i,2)=0 then Gender='M';
         else Gender='F';
      output;
   end;
   drop i;

proc print data=Sample;
  title 'Sample Data for Weighted Average';
run;
Sample Input Table for Weighted Averages
Sample Data for Weighted Average
Note that some of the weights are negative.

Solution

Use the following PROC SQL code to obtain weighted averages that are shown in the following output:
proc sql;
   title 'Weighted Averages from Sample Data';
   select Gender, sum(Value*Weight)/sum(Weight) as WeightedAverage
      from (select Gender, Value,
                   case
                      when Weight gt 0 then Weight
                      else 0
                   end as Weight
               from Sample)
      group by Gender;
PROC SQL Output for Weighted Averages
Weighted Averages from Sample Data

How It Works

This solution uses an in-line view to create a temporary table that eliminates the negative data values in the Weight column. The in-line view is a query that performs the following tasks:
  • selects the Gender and Value columns.
  • uses a CASE expression to select the value from the Weight column. If Weight is greater than zero, then it is retrieved. If Weight is less than zero, then a value of zero is used in place of the Weight value.
           (select Gender, Value,
                   case
                      when Weight>0 then Weight
                      else 0
                   end as Weight
               from Sample)
The first, or outer, SELECT statement in the query, performs the following tasks:
  • selects the Gender column
  • constructs a weighted average from the results that were retrieved by the in-line view
The weighted average is the sum of the products of Value and Weight divided by the sum of the Weights.
select Gender, sum(Value*Weight)/sum(Weight) as WeightedAverage
Finally, the query uses a GROUP BY clause to combine the data so that the calculation is performed for each gender.
group by Gender;