This example highlights the use of the forward-swap selection method, which is a generalization of the maximum R-square improvement (MAXR) method that is available in the REG procedure in SAS/STAT software. This example also demonstrates the use of the INCLUDE and START options.
The following DATA step produces the simulated data in which the response y
depends on six main effects and three 2-way interactions from a set of 20 regressors.
data ex3Data; array x{20}; do i=1 to 10000; do j=1 to 20; x{j} = ranuni(1); end; y = 3*x1 + 7*x2 -5*x3 + 5*x1*x3 + 4*x2*x13 + x7 + x11 -x13 + x1*x4 + rannor(1); output; end; run;
Suppose you want to find the best model of each size in a range of sizes for predicting the response y
. You can use the forward-swap selection method to produce good models of each size without the computational expense of examining
all possible models of each size. In this example, the criterion used to evaluate the models of each size is the model R square.
With this criterion, the forward-swap method coincides with the MAXR method that is available in the REG procedure in SAS/STAT
software. The model of a given size for which no pairwise swap of an effect in the model with any candidate effect improves
the R-square value is deemed to be the best model of that size.
Suppose that you have prior knowledge that the regressors x1
, x2
, and x3
are needed in modeling the response y
. Suppose that you also believe that some of the two-way interactions of these variables are likely to be important in predicting
y
and that some other two-way interactions might also be needed. You can use this prior information by specifying the selection
process shown in the following statements:
proc hpreg data=ex3Data; model y = x1|x2|x3|x4|x5|x6|x7|x8|x9|x10|X11| x12|x13|x14|x5|x16|x7|x18|x19|x20@2 / include=(x1 x2 x3) start=(x1*x2 x1*x3 x2*x3); selection method=forwardswap(select=rsquare maxef=15 choose=sbc) details=all; run;
The MODEL statement specifies that all main effects and two-way interactions are candidates for selection. The INCLUDE= option specifies
that the effects x1
, x2
, and x3
must appear in all models that are examined. The START= option specifies that all the two-way interactions of these variables
should be used in the initial model that is considered but that these interactions are eligible for removal during the forward-swap
selection.
The “Selection Summary” table is shown in Output 12.3.1.
Output 12.3.1: Selection Summary
Selection Summary | |||||
---|---|---|---|---|---|
Step | Effect Entered |
Effect Removed |
Number Effects In |
SBC | Model R-Square |
0 | Intercept | 1 | |||
x1 | 2 | ||||
x2 | 3 | ||||
x1*x2 | 4 | ||||
x3 | 5 | ||||
x1*x3 | 6 | ||||
x2*x3 | 7 | 3307.6836 | 0.8837 | ||
1 | x2*x13 | 8 | 1892.8403 | 0.8992 | |
2 | x7*x11 | x1*x2 | 8 | 618.9298 | 0.9112 |
3 | x1*x4 | x2*x3 | 8 | 405.3751 | 0.9131 |
4 | x13 | 9 | 213.6140 | 0.9148 | |
5 | x7 | 10 | 180.4457 | 0.9152 | |
6 | x11 | x7*x11 | 10 | 1.4039* | 0.9167 |
7 | x10*x11 | 11 | 2.3393 | 0.9168 | |
8 | x3*x7 | 12 | 4.5000 | 0.9168 | |
9 | x6*x7 | 13 | 10.0589 | 0.9169 | |
10 | x3*x6 | 14 | 13.1113 | 0.9169 | |
11 | x5*x20 | 15 | 19.4612 | 0.9169 | |
12 | x13*x20 | x3*x6 | 15 | 18.3678 | 0.9169 |
13 | x5*x5 | x6*x7 | 15 | 12.1398 | 0.9170* |
* Optimal Value of Criterion |
You see that starting from the model with an intercept and the effects specified in the INCLUDE= and START= options at step
0, the forward-swap selection method adds the effect x2*x13
at step one, because this yields the maximum improvement in R square that can be obtained by adding a single effect. The
forward-swap selection method now evaluates whether any effect swap yields a better eight-effect model (one with a higher
R-square value). Because you specified the DETAILS=ALL option in the SELECTION statement, at each step where a swap is made you obtain a “Candidates” table that shows the R-square values for the evaluated swaps. Output 12.3.2 shows the “Candidates” for step 2. By default, only the best 10 swaps are displayed.
Output 12.3.2: Swap Candidates at Step 2
Best 10 Candidates | |||
---|---|---|---|
Rank | Effect Dropped |
Effect Added |
R-Square |
1 | x1*x2 | x7*x11 | 0.9112 |
2 | x2*x3 | x7*x11 | 0.9112 |
3 | x1*x2 | x7 | 0.9065 |
4 | x2*x3 | x7 | 0.9065 |
5 | x1*x2 | x7*x7 | 0.9060 |
6 | x2*x3 | x7*x7 | 0.9060 |
7 | x1*x2 | x4*x7 | 0.9060 |
8 | x2*x3 | x4*x7 | 0.9060 |
9 | x1*x2 | x11 | 0.9058 |
10 | x2*x3 | x11 | 0.9058 |
You see that the best swap adds x7*x11
and drops x1*x2
. This yields an eight-effect model whose R-square value (0.9112) is larger than the R-square value (0.8992) of the eight-effect
model at step 1. Hence this swap is made at step 2. At step 3, an even better eight-effect model than the model at step 2
is obtained by dropping x2*x3
and adding x1*x4
. No additional swap improves the R-square value, and so the model at step 3 is deemed to be the best eight-effect model.
Although this is the best eight-effect model that can be found by this method given the starting model, it is not guaranteed
that this model that has the highest R-square value among all possible models that consist of seven effects and an intercept.
Because the DETAILS=ALL option is specified in the SELECTION statement, details for the model at each step of the selection process are displayed. Output 12.3.3 provides details of the model at step 3.
Output 12.3.3: Model Details at Step 3
Analysis of Variance | |||||
---|---|---|---|---|---|
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 7 | 108630 | 15519 | 15000.3 | <.0001 |
Error | 9992 | 10337 | 1.03455 | ||
Corrected Total | 9999 | 118967 |
Root MSE | 1.01713 |
---|---|
R-Square | 0.91311 |
Adj R-Sq | 0.91305 |
AIC | 10350 |
AICC | 10350 |
SBC | 405.37511 |
ASE | 1.03373 |
Parameter Estimates | |||||
---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| |
Intercept | 1 | 0.012095 | 0.045712 | 0.26 | 0.7913 |
x1 | 1 | 3.087078 | 0.076390 | 40.41 | <.0001 |
x2 | 1 | 7.775180 | 0.046815 | 166.08 | <.0001 |
x3 | 1 | -4.957140 | 0.070995 | -69.82 | <.0001 |
x1*x3 | 1 | 4.910115 | 0.122503 | 40.08 | <.0001 |
x1*x4 | 1 | 0.890436 | 0.060523 | 14.71 | <.0001 |
x7*x11 | 1 | 1.708469 | 0.045939 | 37.19 | <.0001 |
x2*x13 | 1 | 2.584078 | 0.061506 | 42.01 | <.0001 |
The forward-swap method continues to find the best nine-effect model, best 10-effect model, and so on until it obtains the best 15-effect model. At this point the selection terminates because you specified the MAXEF=15 option in the SELECTION statement. The R-square value increases at each step of the selection process. However, because you specified the CHOOSE=SBC criterion in the SELECTION statement, the final model selected is the model at step 6.