Reweighting Observations in an Analysis |
Reweighting observations is an interactive feature of PROC REG that enables you to change the weights of observations used in computing the regression equation. Observations can also be deleted from the analysis (not from the data set) by changing their weights to zero. In the following statements, the Class data (in the section Getting Started: REG Procedure) are used to illustrate some of the features of the REWEIGHT statement. First, the full model is fit, and the residuals are displayed in Figure 76.40.
proc reg data=Class; model Weight=Age Height / p; id Name; run;
Output Statistics | ||||
---|---|---|---|---|
Obs | Name | Dependent Variable |
Predicted Value |
Residual |
1 | Alfred | 112.5000 | 124.8686 | -12.3686 |
2 | Alice | 84.0000 | 78.6273 | 5.3727 |
3 | Barbara | 98.0000 | 110.2812 | -12.2812 |
4 | Carol | 102.5000 | 102.5670 | -0.0670 |
5 | Henry | 102.5000 | 105.0849 | -2.5849 |
6 | James | 83.0000 | 80.2266 | 2.7734 |
7 | Jane | 84.5000 | 89.2191 | -4.7191 |
8 | Janet | 112.5000 | 102.7663 | 9.7337 |
9 | Jeffrey | 84.0000 | 100.2095 | -16.2095 |
10 | John | 99.5000 | 86.3415 | 13.1585 |
11 | Joyce | 50.5000 | 57.3660 | -6.8660 |
12 | Judy | 90.0000 | 107.9625 | -17.9625 |
13 | Louise | 77.0000 | 76.6295 | 0.3705 |
14 | Mary | 112.0000 | 117.1544 | -5.1544 |
15 | Philip | 150.0000 | 138.2164 | 11.7836 |
16 | Robert | 128.0000 | 107.2043 | 20.7957 |
17 | Ronald | 133.0000 | 118.9529 | 14.0471 |
18 | Thomas | 85.0000 | 79.6676 | 5.3324 |
19 | William | 112.0000 | 117.1544 | -5.1544 |
Sum of Residuals | 0 |
---|---|
Sum of Squared Residuals | 2120.09974 |
Predicted Residual SS (PRESS) | 3272.72186 |
Upon examining the data and residuals, you realize that observation 17 (Ronald) was mistakenly included in the analysis. Also, you would like to examine the effect of reweighting to 0.5 those observations with residuals that have absolute values greater than or equal to 17. The following statements show how you request this reweighting:
reweight obs.=17; reweight r. le -17 or r. ge 17 / weight=0.5; print p; run;
At this point, a message appears (in the log) that tells you which observations have been reweighted and what the new weights are. Figure 76.41 is produced.
Output Statistics | |||||
---|---|---|---|---|---|
Obs | Name | Weight Variable |
Dependent Variable |
Predicted Value |
Residual |
1 | Alfred | 1.0000 | 112.5000 | 121.6250 | -9.1250 |
2 | Alice | 1.0000 | 84.0000 | 79.9296 | 4.0704 |
3 | Barbara | 1.0000 | 98.0000 | 107.5484 | -9.5484 |
4 | Carol | 1.0000 | 102.5000 | 102.1663 | 0.3337 |
5 | Henry | 1.0000 | 102.5000 | 104.3632 | -1.8632 |
6 | James | 1.0000 | 83.0000 | 79.9762 | 3.0238 |
7 | Jane | 1.0000 | 84.5000 | 87.8225 | -3.3225 |
8 | Janet | 1.0000 | 112.5000 | 103.6889 | 8.8111 |
9 | Jeffrey | 1.0000 | 84.0000 | 98.7606 | -14.7606 |
10 | John | 1.0000 | 99.5000 | 85.3117 | 14.1883 |
11 | Joyce | 1.0000 | 50.5000 | 58.6811 | -8.1811 |
12 | Judy | 0.5000 | 90.0000 | 106.8740 | -16.8740 |
13 | Louise | 1.0000 | 77.0000 | 76.8377 | 0.1623 |
14 | Mary | 1.0000 | 112.0000 | 116.2429 | -4.2429 |
15 | Philip | 1.0000 | 150.0000 | 135.9688 | 14.0312 |
16 | Robert | 0.5000 | 128.0000 | 103.5150 | 24.4850 |
17 | Ronald | 0 | 133.0000 | 117.8121 | 15.1879 |
18 | Thomas | 1.0000 | 85.0000 | 78.1398 | 6.8602 |
19 | William | 1.0000 | 112.0000 | 116.2429 | -4.2429 |
Sum of Residuals | 0 |
---|---|
Sum of Squared Residuals | 1500.61194 |
Predicted Residual SS (PRESS) | 2287.57621 |
The first REWEIGHT statement excludes observation 17, and the second REWEIGHT statement reweights observations 12 and 16 to 0.5. An important feature to note from this example is that the model is not refit until after the PRINT statement. REWEIGHT statements do not cause the model to be refit. This is so that multiple REWEIGHT statements can be applied to a subsequent model.
In this example, since the intent is to reweight observations with large residuals, the observation that was mistakenly included in the analysis should be deleted; then the model should be fit for those remaining observations, and the observations with large residuals should be reweighted. To accomplish this, use the REFIT statement. Note that the model label has been changed from MODEL1 to MODEL1.2 since two REWEIGHT statements have been used. The following statements produce Figure 76.42:
reweight allobs / weight=1.0; reweight obs.=17; refit; reweight r. le -17 or r. ge 17 / weight=.5; print; run;
Output Statistics | |||||
---|---|---|---|---|---|
Obs | Name | Weight Variable |
Dependent Variable |
Predicted Value |
Residual |
1 | Alfred | 1.0000 | 112.5000 | 120.9716 | -8.4716 |
2 | Alice | 1.0000 | 84.0000 | 79.5342 | 4.4658 |
3 | Barbara | 1.0000 | 98.0000 | 107.0746 | -9.0746 |
4 | Carol | 1.0000 | 102.5000 | 101.5681 | 0.9319 |
5 | Henry | 1.0000 | 102.5000 | 103.7588 | -1.2588 |
6 | James | 1.0000 | 83.0000 | 79.7204 | 3.2796 |
7 | Jane | 1.0000 | 84.5000 | 87.5443 | -3.0443 |
8 | Janet | 1.0000 | 112.5000 | 102.9467 | 9.5533 |
9 | Jeffrey | 1.0000 | 84.0000 | 98.3117 | -14.3117 |
10 | John | 1.0000 | 99.5000 | 85.0407 | 14.4593 |
11 | Joyce | 1.0000 | 50.5000 | 58.6253 | -8.1253 |
12 | Judy | 1.0000 | 90.0000 | 106.2625 | -16.2625 |
13 | Louise | 1.0000 | 77.0000 | 76.5908 | 0.4092 |
14 | Mary | 1.0000 | 112.0000 | 115.4651 | -3.4651 |
15 | Philip | 1.0000 | 150.0000 | 134.9953 | 15.0047 |
16 | Robert | 0.5000 | 128.0000 | 103.1923 | 24.8077 |
17 | Ronald | 0 | 133.0000 | 117.0299 | 15.9701 |
18 | Thomas | 1.0000 | 85.0000 | 78.0288 | 6.9712 |
19 | William | 1.0000 | 112.0000 | 115.4651 | -3.4651 |
Sum of Residuals | 0 |
---|---|
Sum of Squared Residuals | 1637.81879 |
Predicted Residual SS (PRESS) | 2473.87984 |
Notice that this results in a slightly different model than the previous set of statements: only observation 16 is reweighted to 0.5. Also note that the model label is now MODEL1.5 since five REWEIGHT statements have been used for this model.
Another important feature of the REWEIGHT statement is the ability to nullify the effect of a previous or all REWEIGHT statements. First, assume that you have several REWEIGHT statements in effect and you want to restore the original weights of all the observations. The following REWEIGHT statement accomplishes this and produces Figure 76.43:
reweight allobs / reset; print; run;
Output Statistics | ||||
---|---|---|---|---|
Obs | Name | Dependent Variable |
Predicted Value |
Residual |
1 | Alfred | 112.5000 | 124.8686 | -12.3686 |
2 | Alice | 84.0000 | 78.6273 | 5.3727 |
3 | Barbara | 98.0000 | 110.2812 | -12.2812 |
4 | Carol | 102.5000 | 102.5670 | -0.0670 |
5 | Henry | 102.5000 | 105.0849 | -2.5849 |
6 | James | 83.0000 | 80.2266 | 2.7734 |
7 | Jane | 84.5000 | 89.2191 | -4.7191 |
8 | Janet | 112.5000 | 102.7663 | 9.7337 |
9 | Jeffrey | 84.0000 | 100.2095 | -16.2095 |
10 | John | 99.5000 | 86.3415 | 13.1585 |
11 | Joyce | 50.5000 | 57.3660 | -6.8660 |
12 | Judy | 90.0000 | 107.9625 | -17.9625 |
13 | Louise | 77.0000 | 76.6295 | 0.3705 |
14 | Mary | 112.0000 | 117.1544 | -5.1544 |
15 | Philip | 150.0000 | 138.2164 | 11.7836 |
16 | Robert | 128.0000 | 107.2043 | 20.7957 |
17 | Ronald | 133.0000 | 118.9529 | 14.0471 |
18 | Thomas | 85.0000 | 79.6676 | 5.3324 |
19 | William | 112.0000 | 117.1544 | -5.1544 |
Sum of Residuals | 0 |
---|---|
Sum of Squared Residuals | 2120.09974 |
Predicted Residual SS (PRESS) | 3272.72186 |
The resulting model is identical to the original model specified at the beginning of this section. Notice that the model label is now MODEL1.6. Note that the Weight column does not appear, since all observations have been reweighted to have weight=1.
Now suppose you want only to undo the changes made by the most recent REWEIGHT statement. Use REWEIGHT UNDO for this. The following statements produce Figure 76.44:
reweight r. le -12 or r. ge 12 / weight=.75; reweight r. le -17 or r. ge 17 / weight=.5; reweight undo; print; run;
Output Statistics | |||||
---|---|---|---|---|---|
Obs | Name | Weight Variable |
Dependent Variable |
Predicted Value |
Residual |
1 | Alfred | 0.7500 | 112.5000 | 125.1152 | -12.6152 |
2 | Alice | 1.0000 | 84.0000 | 78.7691 | 5.2309 |
3 | Barbara | 0.7500 | 98.0000 | 110.3236 | -12.3236 |
4 | Carol | 1.0000 | 102.5000 | 102.8836 | -0.3836 |
5 | Henry | 1.0000 | 102.5000 | 105.3936 | -2.8936 |
6 | James | 1.0000 | 83.0000 | 80.1133 | 2.8867 |
7 | Jane | 1.0000 | 84.5000 | 89.0776 | -4.5776 |
8 | Janet | 1.0000 | 112.5000 | 103.3322 | 9.1678 |
9 | Jeffrey | 0.7500 | 84.0000 | 100.2835 | -16.2835 |
10 | John | 0.7500 | 99.5000 | 86.2090 | 13.2910 |
11 | Joyce | 1.0000 | 50.5000 | 57.0745 | -6.5745 |
12 | Judy | 0.7500 | 90.0000 | 108.2622 | -18.2622 |
13 | Louise | 1.0000 | 77.0000 | 76.5275 | 0.4725 |
14 | Mary | 1.0000 | 112.0000 | 117.6752 | -5.6752 |
15 | Philip | 1.0000 | 150.0000 | 138.9211 | 11.0789 |
16 | Robert | 0.7500 | 128.0000 | 107.0063 | 20.9937 |
17 | Ronald | 0.7500 | 133.0000 | 119.4681 | 13.5319 |
18 | Thomas | 1.0000 | 85.0000 | 79.3061 | 5.6939 |
19 | William | 1.0000 | 112.0000 | 117.6752 | -5.6752 |
Sum of Residuals | 0 |
---|---|
Sum of Squared Residuals | 1694.87114 |
Predicted Residual SS (PRESS) | 2547.22751 |
The resulting model reflects changes made only by the first REWEIGHT statement since the third REWEIGHT statement negates the effect of the second REWEIGHT statement. Observations 1, 3, 9, 10, 12, 16, and 17 have their weights changed to 0.75. Note that the label MODEL1.9 reflects the use of nine REWEIGHT statements for the current model.
Now suppose you want to reset the observations selected by the most recent REWEIGHT statement to their original weights. Use the REWEIGHT statement with the RESET option to do this. The following statements produce Figure 76.45:
reweight r. le -12 or r. ge 12 / weight=.75; reweight r. le -17 or r. ge 17 / weight=.5; reweight / reset; print; run;
Output Statistics | |||||
---|---|---|---|---|---|
Obs | Name | Weight Variable |
Dependent Variable |
Predicted Value |
Residual |
1 | Alfred | 0.7500 | 112.5000 | 126.0076 | -13.5076 |
2 | Alice | 1.0000 | 84.0000 | 77.8727 | 6.1273 |
3 | Barbara | 0.7500 | 98.0000 | 111.2805 | -13.2805 |
4 | Carol | 1.0000 | 102.5000 | 102.4703 | 0.0297 |
5 | Henry | 1.0000 | 102.5000 | 105.1278 | -2.6278 |
6 | James | 1.0000 | 83.0000 | 80.2290 | 2.7710 |
7 | Jane | 1.0000 | 84.5000 | 89.7199 | -5.2199 |
8 | Janet | 1.0000 | 112.5000 | 102.0122 | 10.4878 |
9 | Jeffrey | 0.7500 | 84.0000 | 100.6507 | -16.6507 |
10 | John | 0.7500 | 99.5000 | 86.6828 | 12.8172 |
11 | Joyce | 1.0000 | 50.5000 | 56.7703 | -6.2703 |
12 | Judy | 1.0000 | 90.0000 | 108.1649 | -18.1649 |
13 | Louise | 1.0000 | 77.0000 | 76.4327 | 0.5673 |
14 | Mary | 1.0000 | 112.0000 | 117.1975 | -5.1975 |
15 | Philip | 1.0000 | 150.0000 | 138.7581 | 11.2419 |
16 | Robert | 1.0000 | 128.0000 | 108.7016 | 19.2984 |
17 | Ronald | 0.7500 | 133.0000 | 119.0957 | 13.9043 |
18 | Thomas | 1.0000 | 85.0000 | 80.3076 | 4.6924 |
19 | William | 1.0000 | 112.0000 | 117.1975 | -5.1975 |
Sum of Residuals | 0 |
---|---|
Sum of Squared Residuals | 1879.08980 |
Predicted Residual SS (PRESS) | 2959.57279 |
Note that observations that meet the condition of the second REWEIGHT statement (residuals with an absolute value greater than or equal to 17) now have weights reset to their original value of 1. Observations 1, 3, 9, 10, and 17 have weights of 0.75, but observations 12 and 16 (which meet the condition of the second REWEIGHT statement) have their weights reset to 1.
Notice how the last three examples show three ways to change weights back to a previous value. In the first example, ALLOBS and the RESET option are used to change weights for all observations back to their original values. In the second example, the UNDO option is used to negate the effect of a previous REWEIGHT statement, thus changing weights for observations selected in the previous REWEIGHT statement to the weights specified in still another REWEIGHT statement. In the third example, the RESET option is used to change weights for observations selected in a previous REWEIGHT statement back to their original values. Finally, note that the label MODEL1.12 indicates that 12 REWEIGHT statements have been applied to the original model.