Reweighting Observations in an Analysis

Reweighting observations is an interactive feature of PROC REG that enables you to change the weights of observations used in computing the regression equation. Observations can also be deleted from the analysis (not from the data set) by changing their weights to zero. In the following statements, the Class data (in the section Getting Started: REG Procedure) are used to illustrate some of the features of the REWEIGHT statement. First, the full model is fit, and the residuals are displayed in Figure 76.40.

proc reg data=Class;
   model Weight=Age Height / p;
   id Name;
run;

Figure 76.40 Full Model for Class Data, Residuals Shown
The REG Procedure
Model: MODEL1
Dependent Variable: Weight

Output Statistics
Obs Name Dependent
Variable
Predicted
Value
Residual
1 Alfred 112.5000 124.8686 -12.3686
2 Alice 84.0000 78.6273 5.3727
3 Barbara 98.0000 110.2812 -12.2812
4 Carol 102.5000 102.5670 -0.0670
5 Henry 102.5000 105.0849 -2.5849
6 James 83.0000 80.2266 2.7734
7 Jane 84.5000 89.2191 -4.7191
8 Janet 112.5000 102.7663 9.7337
9 Jeffrey 84.0000 100.2095 -16.2095
10 John 99.5000 86.3415 13.1585
11 Joyce 50.5000 57.3660 -6.8660
12 Judy 90.0000 107.9625 -17.9625
13 Louise 77.0000 76.6295 0.3705
14 Mary 112.0000 117.1544 -5.1544
15 Philip 150.0000 138.2164 11.7836
16 Robert 128.0000 107.2043 20.7957
17 Ronald 133.0000 118.9529 14.0471
18 Thomas 85.0000 79.6676 5.3324
19 William 112.0000 117.1544 -5.1544

Sum of Residuals 0
Sum of Squared Residuals 2120.09974
Predicted Residual SS (PRESS) 3272.72186

Upon examining the data and residuals, you realize that observation 17 (Ronald) was mistakenly included in the analysis. Also, you would like to examine the effect of reweighting to 0.5 those observations with residuals that have absolute values greater than or equal to 17. The following statements show how you request this reweighting:

reweight obs.=17;
reweight r. le -17 or r. ge 17 / weight=0.5;
print p;
run;

At this point, a message appears (in the log) that tells you which observations have been reweighted and what the new weights are. Figure 76.41 is produced.

Figure 76.41 Model with Reweighted Observations
The REG Procedure
Model: MODEL1.2
Dependent Variable: Weight

Output Statistics
Obs Name Weight
Variable
Dependent
Variable
Predicted
Value
Residual
1 Alfred 1.0000 112.5000 121.6250 -9.1250
2 Alice 1.0000 84.0000 79.9296 4.0704
3 Barbara 1.0000 98.0000 107.5484 -9.5484
4 Carol 1.0000 102.5000 102.1663 0.3337
5 Henry 1.0000 102.5000 104.3632 -1.8632
6 James 1.0000 83.0000 79.9762 3.0238
7 Jane 1.0000 84.5000 87.8225 -3.3225
8 Janet 1.0000 112.5000 103.6889 8.8111
9 Jeffrey 1.0000 84.0000 98.7606 -14.7606
10 John 1.0000 99.5000 85.3117 14.1883
11 Joyce 1.0000 50.5000 58.6811 -8.1811
12 Judy 0.5000 90.0000 106.8740 -16.8740
13 Louise 1.0000 77.0000 76.8377 0.1623
14 Mary 1.0000 112.0000 116.2429 -4.2429
15 Philip 1.0000 150.0000 135.9688 14.0312
16 Robert 0.5000 128.0000 103.5150 24.4850
17 Ronald 0 133.0000 117.8121 15.1879
18 Thomas 1.0000 85.0000 78.1398 6.8602
19 William 1.0000 112.0000 116.2429 -4.2429

Sum of Residuals 0
Sum of Squared Residuals 1500.61194
Predicted Residual SS (PRESS) 2287.57621

The first REWEIGHT statement excludes observation 17, and the second REWEIGHT statement reweights observations 12 and 16 to 0.5. An important feature to note from this example is that the model is not refit until after the PRINT statement. REWEIGHT statements do not cause the model to be refit. This is so that multiple REWEIGHT statements can be applied to a subsequent model.

In this example, since the intent is to reweight observations with large residuals, the observation that was mistakenly included in the analysis should be deleted; then the model should be fit for those remaining observations, and the observations with large residuals should be reweighted. To accomplish this, use the REFIT statement. Note that the model label has been changed from MODEL1 to MODEL1.2 since two REWEIGHT statements have been used. The following statements produce Figure 76.42:

reweight allobs / weight=1.0;
reweight obs.=17;
refit;
reweight r. le -17 or r. ge 17 / weight=.5;
print;
run;

Figure 76.42 Observations Excluded from Analysis, Model Refitted, and Observations Reweighted
The REG Procedure
Model: MODEL1.5
Dependent Variable: Weight

Output Statistics
Obs Name Weight
Variable
Dependent
Variable
Predicted
Value
Residual
1 Alfred 1.0000 112.5000 120.9716 -8.4716
2 Alice 1.0000 84.0000 79.5342 4.4658
3 Barbara 1.0000 98.0000 107.0746 -9.0746
4 Carol 1.0000 102.5000 101.5681 0.9319
5 Henry 1.0000 102.5000 103.7588 -1.2588
6 James 1.0000 83.0000 79.7204 3.2796
7 Jane 1.0000 84.5000 87.5443 -3.0443
8 Janet 1.0000 112.5000 102.9467 9.5533
9 Jeffrey 1.0000 84.0000 98.3117 -14.3117
10 John 1.0000 99.5000 85.0407 14.4593
11 Joyce 1.0000 50.5000 58.6253 -8.1253
12 Judy 1.0000 90.0000 106.2625 -16.2625
13 Louise 1.0000 77.0000 76.5908 0.4092
14 Mary 1.0000 112.0000 115.4651 -3.4651
15 Philip 1.0000 150.0000 134.9953 15.0047
16 Robert 0.5000 128.0000 103.1923 24.8077
17 Ronald 0 133.0000 117.0299 15.9701
18 Thomas 1.0000 85.0000 78.0288 6.9712
19 William 1.0000 112.0000 115.4651 -3.4651

Sum of Residuals 0
Sum of Squared Residuals 1637.81879
Predicted Residual SS (PRESS) 2473.87984

Notice that this results in a slightly different model than the previous set of statements: only observation 16 is reweighted to 0.5. Also note that the model label is now MODEL1.5 since five REWEIGHT statements have been used for this model.

Another important feature of the REWEIGHT statement is the ability to nullify the effect of a previous or all REWEIGHT statements. First, assume that you have several REWEIGHT statements in effect and you want to restore the original weights of all the observations. The following REWEIGHT statement accomplishes this and produces Figure 76.43:

reweight allobs / reset;
print;
run;

Figure 76.43 Restoring Weights of All Observations
The REG Procedure
Model: MODEL1.6
Dependent Variable: Weight

Output Statistics
Obs Name Dependent
Variable
Predicted
Value
Residual
1 Alfred 112.5000 124.8686 -12.3686
2 Alice 84.0000 78.6273 5.3727
3 Barbara 98.0000 110.2812 -12.2812
4 Carol 102.5000 102.5670 -0.0670
5 Henry 102.5000 105.0849 -2.5849
6 James 83.0000 80.2266 2.7734
7 Jane 84.5000 89.2191 -4.7191
8 Janet 112.5000 102.7663 9.7337
9 Jeffrey 84.0000 100.2095 -16.2095
10 John 99.5000 86.3415 13.1585
11 Joyce 50.5000 57.3660 -6.8660
12 Judy 90.0000 107.9625 -17.9625
13 Louise 77.0000 76.6295 0.3705
14 Mary 112.0000 117.1544 -5.1544
15 Philip 150.0000 138.2164 11.7836
16 Robert 128.0000 107.2043 20.7957
17 Ronald 133.0000 118.9529 14.0471
18 Thomas 85.0000 79.6676 5.3324
19 William 112.0000 117.1544 -5.1544

Sum of Residuals 0
Sum of Squared Residuals 2120.09974
Predicted Residual SS (PRESS) 3272.72186

The resulting model is identical to the original model specified at the beginning of this section. Notice that the model label is now MODEL1.6. Note that the Weight column does not appear, since all observations have been reweighted to have weight=1.

Now suppose you want only to undo the changes made by the most recent REWEIGHT statement. Use REWEIGHT UNDO for this. The following statements produce Figure 76.44:


reweight r. le -12 or r. ge 12 / weight=.75;
reweight r. le -17 or r. ge 17 / weight=.5;
reweight undo;
print;
run;

Figure 76.44 Example of UNDO in REWEIGHT Statement
The REG Procedure
Model: MODEL1.9
Dependent Variable: Weight

Output Statistics
Obs Name Weight
Variable
Dependent
Variable
Predicted
Value
Residual
1 Alfred 0.7500 112.5000 125.1152 -12.6152
2 Alice 1.0000 84.0000 78.7691 5.2309
3 Barbara 0.7500 98.0000 110.3236 -12.3236
4 Carol 1.0000 102.5000 102.8836 -0.3836
5 Henry 1.0000 102.5000 105.3936 -2.8936
6 James 1.0000 83.0000 80.1133 2.8867
7 Jane 1.0000 84.5000 89.0776 -4.5776
8 Janet 1.0000 112.5000 103.3322 9.1678
9 Jeffrey 0.7500 84.0000 100.2835 -16.2835
10 John 0.7500 99.5000 86.2090 13.2910
11 Joyce 1.0000 50.5000 57.0745 -6.5745
12 Judy 0.7500 90.0000 108.2622 -18.2622
13 Louise 1.0000 77.0000 76.5275 0.4725
14 Mary 1.0000 112.0000 117.6752 -5.6752
15 Philip 1.0000 150.0000 138.9211 11.0789
16 Robert 0.7500 128.0000 107.0063 20.9937
17 Ronald 0.7500 133.0000 119.4681 13.5319
18 Thomas 1.0000 85.0000 79.3061 5.6939
19 William 1.0000 112.0000 117.6752 -5.6752

Sum of Residuals 0
Sum of Squared Residuals 1694.87114
Predicted Residual SS (PRESS) 2547.22751

The resulting model reflects changes made only by the first REWEIGHT statement since the third REWEIGHT statement negates the effect of the second REWEIGHT statement. Observations 1, 3, 9, 10, 12, 16, and 17 have their weights changed to 0.75. Note that the label MODEL1.9 reflects the use of nine REWEIGHT statements for the current model.

Now suppose you want to reset the observations selected by the most recent REWEIGHT statement to their original weights. Use the REWEIGHT statement with the RESET option to do this. The following statements produce Figure 76.45:

reweight r. le -12 or r. ge 12 / weight=.75;
reweight r. le -17 or r. ge 17 / weight=.5;
reweight / reset;
print;
run;

Figure 76.45 REWEIGHT Statement with RESET option
The REG Procedure
Model: MODEL1.12
Dependent Variable: Weight

Output Statistics
Obs Name Weight
Variable
Dependent
Variable
Predicted
Value
Residual
1 Alfred 0.7500 112.5000 126.0076 -13.5076
2 Alice 1.0000 84.0000 77.8727 6.1273
3 Barbara 0.7500 98.0000 111.2805 -13.2805
4 Carol 1.0000 102.5000 102.4703 0.0297
5 Henry 1.0000 102.5000 105.1278 -2.6278
6 James 1.0000 83.0000 80.2290 2.7710
7 Jane 1.0000 84.5000 89.7199 -5.2199
8 Janet 1.0000 112.5000 102.0122 10.4878
9 Jeffrey 0.7500 84.0000 100.6507 -16.6507
10 John 0.7500 99.5000 86.6828 12.8172
11 Joyce 1.0000 50.5000 56.7703 -6.2703
12 Judy 1.0000 90.0000 108.1649 -18.1649
13 Louise 1.0000 77.0000 76.4327 0.5673
14 Mary 1.0000 112.0000 117.1975 -5.1975
15 Philip 1.0000 150.0000 138.7581 11.2419
16 Robert 1.0000 128.0000 108.7016 19.2984
17 Ronald 0.7500 133.0000 119.0957 13.9043
18 Thomas 1.0000 85.0000 80.3076 4.6924
19 William 1.0000 112.0000 117.1975 -5.1975

Sum of Residuals 0
Sum of Squared Residuals 1879.08980
Predicted Residual SS (PRESS) 2959.57279

Note that observations that meet the condition of the second REWEIGHT statement (residuals with an absolute value greater than or equal to 17) now have weights reset to their original value of 1. Observations 1, 3, 9, 10, and 17 have weights of 0.75, but observations 12 and 16 (which meet the condition of the second REWEIGHT statement) have their weights reset to 1.

Notice how the last three examples show three ways to change weights back to a previous value. In the first example, ALLOBS and the RESET option are used to change weights for all observations back to their original values. In the second example, the UNDO option is used to negate the effect of a previous REWEIGHT statement, thus changing weights for observations selected in the previous REWEIGHT statement to the weights specified in still another REWEIGHT statement. In the third example, the RESET option is used to change weights for observations selected in a previous REWEIGHT statement back to their original values. Finally, note that the label MODEL1.12 indicates that 12 REWEIGHT statements have been applied to the original model.