The HPREG Procedure

Example 60.3 Forward-Swap Selection

This example highlights the use of the forward-swap selection method, which is a generalization of the maximum R-square improvement (MAXR) method that is available in the REG procedure in SAS/STAT software. This example also demonstrates the use of the INCLUDE and START options.

The following DATA step produces the simulated data in which the response y depends on six main effects and three 2-way interactions from a set of 20 regressors.

  
  data ex3Data;
     array x{20};
     do i=1 to 10000;
        do j=1 to 20;
           x{j} = ranuni(1);
        end;
        y = 3*x1 + 7*x2 -5*x3 + 5*x1*x3 + 
            4*x2*x13 + x7 + x11 -x13  + x1*x4 + rannor(1);
        output;
     end;
  run;

Suppose you want to find the best model of each size in a range of sizes for predicting the response y. You can use the forward-swap selection method to produce good models of each size without the computational expense of examining all possible models of each size. In this example, the criterion used to evaluate the models of each size is the model R square. With this criterion, the forward-swap method coincides with the MAXR method that is available in the REG procedure in SAS/STAT software. The model of a given size for which no pairwise swap of an effect in the model with any candidate effect improves the R-square value is deemed to be the best model of that size.

Suppose that you have prior knowledge that the regressors x1, x2, and x3 are needed in modeling the response y. Suppose that you also believe that some of the two-way interactions of these variables are likely to be important in predicting y and that some other two-way interactions might also be needed. You can use this prior information by specifying the selection process shown in the following statements:


 proc hpreg data=ex3Data;
     model y = x1|x2|x3|x4|x5|x6|x7|x8|x9|x10|X11|
               x12|x13|x14|x5|x16|x7|x18|x19|x20@2 /
                   include=(x1 x2 x3) start=(x1*x2 x1*x3 x2*x3);
     selection method=forwardswap(select=rsquare maxef=15 choose=sbc)
               details=all;
 run;

The MODEL statement specifies that all main effects and two-way interactions are candidates for selection. The INCLUDE= option specifies that the effects x1, x2, and x3 must appear in all models that are examined. The START= option specifies that all the two-way interactions of these variables should be used in the initial model that is considered but that these interactions are eligible for removal during the forward-swap selection.

The "Selection Summary" table is shown in Output 60.3.1.

Output 60.3.1: Selection Summary

The HPREG Procedure

Selection Summary
Step Effect
Entered
Effect
Removed
Number
Effects In
SBC Model
R-Square
0 Intercept   1    
  x1   2    
  x2   3    
  x1*x2   4    
  x3   5    
  x1*x3   6    
  x2*x3   7 3307.6836 0.8837
1 x2*x13   8 1892.8403 0.8992
2 x7*x11 x1*x2 8 618.9298 0.9112
3 x1*x4 x2*x3 8 405.3751 0.9131
4 x13   9 213.6140 0.9148
5 x7   10 180.4457 0.9152
6 x11 x7*x11 10 1.4039* 0.9167
7 x10*x11   11 2.3393 0.9168
8 x3*x7   12 4.5000 0.9168
9 x6*x7   13 10.0589 0.9169
10 x3*x6   14 13.1113 0.9169
11 x5*x20   15 19.4612 0.9169
12 x13*x20 x3*x6 15 18.3678 0.9169
13 x5*x5 x6*x7 15 12.1398 0.9170*

* Optimal Value of Criterion




You see that starting from the model with an intercept and the effects specified in the INCLUDE= and START= options at step 0, the forward-swap selection method adds the effect x2*x13 at step one, because this yields the maximum improvement in R square that can be obtained by adding a single effect. The forward-swap selection method now evaluates whether any effect swap yields a better eight-effect model (one with a higher R-square value). Because you specified the DETAILS=ALL option in the SELECTION statement, at each step where a swap is made you obtain a "Candidates" table that shows the R-square values for the evaluated swaps. Output 60.3.2 shows the "Candidates" for step 2. By default, only the best 10 swaps are displayed.

Output 60.3.2: Swap Candidates at Step 2

Best 10 Candidates
Rank Effect
Dropped
Effect
Added
R-Square
1 x1*x2 x7*x11 0.9112
2 x2*x3 x7*x11 0.9112
3 x1*x2 x7 0.9065
4 x2*x3 x7 0.9065
5 x1*x2 x7*x7 0.9060
6 x2*x3 x7*x7 0.9060
7 x1*x2 x4*x7 0.9060
8 x2*x3 x4*x7 0.9060
9 x1*x2 x11 0.9058
10 x2*x3 x11 0.9058



You see that the best swap adds x7*x11 and drops x1*x2. This yields an eight-effect model whose R-square value (0.9112) is larger than the R-square value (0.8992) of the eight-effect model at step 1. Hence this swap is made at step 2. At step 3, an even better eight-effect model than the model at step 2 is obtained by dropping x2*x3 and adding x1*x4. No additional swap improves the R-square value, and so the model at step 3 is deemed to be the best eight-effect model. Although this is the best eight-effect model that can be found by this method given the starting model, it is not guaranteed that this model that has the highest R-square value among all possible models that consist of seven effects and an intercept.

Because the DETAILS=ALL option is specified in the SELECTION statement, details for the model at each step of the selection process are displayed. Output 60.3.3 provides details of the model at step 3.

Output 60.3.3: Model Details at Step 3

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 7 108630 15519 15000.3 <.0001
Error 9992 10337 1.03455    
Corrected Total 9999 118967      

Root MSE 1.01713
R-Square 0.91311
Adj R-Sq 0.91305
AIC 10350
AICC 10350
SBC 405.37511
ASE 1.03373

Parameter Estimates
Parameter DF Estimate Standard
Error
t Value Pr > |t|
Intercept 1 0.012095 0.045712 0.26 0.7913
x1 1 3.087078 0.076390 40.41 <.0001
x2 1 7.775180 0.046815 166.08 <.0001
x3 1 -4.957140 0.070995 -69.82 <.0001
x1*x3 1 4.910115 0.122503 40.08 <.0001
x1*x4 1 0.890436 0.060523 14.71 <.0001
x7*x11 1 1.708469 0.045939 37.19 <.0001
x2*x13 1 2.584078 0.061506 42.01 <.0001



The forward-swap method continues to find the best nine-effect model, best 10-effect model, and so on until it obtains the best 15-effect model. At this point the selection terminates because you specified the MAXEF=15 option in the SELECTION statement. The R-square value increases at each step of the selection process. However, because you specified the CHOOSE=SBC criterion in the SELECTION statement, the final model selected is the model at step 6.