The HPREG Procedure

Example 12.4 Forward Selection with Screening

This example shows how you can use the SCREEN option in the SELECTION statement to greatly speed up model selection from a large number of regressors. In order to demonstrate the efficacy of model selection with screening, this example uses simulated data in which the response y depends systematically on a relatively small subset of a much larger set of regressors, which is described in Table 12.8.

Table 12.8: Complete Set of Regressors

Regressor Name

Type

Number of Levels

In True Model

xIn1–xIn25

Continuous

 

Yes

xWeakIn1–xWeakIn2

Continuous

 

Yes

xOut1–xOut500

Continuous

 

No

cIn1–cIn5

Classification

From two to five

Yes

cOut1–cOut500

Classification

From two to five

No


The labels In and Out, which are part of the variable names, make it easy to identify whether the selected model succeeds or fails in capturing the true underlying model. The regressors that are labeled xWeakIn1 and xWeakIn2 are predictive, but their influence is substantially smaller than the influence of the other regressors in the true model.

The following DATA step generates the data:

  %let nObs      = 50000;
  %let nContIn   = 25;
  %let nContOut  = 500;
  %let nClassIn  = 5;
  %let nClassOut = 500;
  %let maxLevs   = 5;
  %let noiseScale= 1;  

   data ex4Data;
     array xIn{&nContIn}; 
     array xOut{&nContOut}; 
     array cIn{&nClassIn}; 
     array cOut{&nClassOut};

     drop i j sign nLevs xBeta;
 
     do i=1 to &nObs;  
        sign  = -1;
        xBeta = 0;
        do j=1 to dim(xIn); 
           xIn{j} = ranuni(1);
           xBeta  = xBeta + j*sign*xIn{j};
           sign   = -sign; 
        end; 
        do j=1 to dim(xOut); 
           xOut{j} = ranuni(1); 
        end;

        xWeakIn1 = ranuni(1);
        xWeakin2 = ranuni(1);

        xBeta  = xBeta + 0.1*xWeakIn1+ 0.1*xWeakIn2;

        do j=1 to dim(cIn); 
           nLevs  = 2 + mod(j,&maxlevs-1);
           cIn{j} = 1+int(ranuni(1)*nLevs);
           xBeta  = xBeta + j*sign*(cIn{j}-nLevs/2); 
           sign   = -sign;  
        end; 

        do j=1 to dim(cOut);
           nLevs  = 2 + mod(j,&maxlevs-1); 
           cOut{j} = 1+int(ranuni(1)*nLevs); 
        end;

        y = xBeta + &noiseScale*rannor(1);
       
        output;  
    end; 
  run;  

When you have insufficient prior knowledge of what effects need to be included in a parsimonious predictive model, a reasonable starting point is to use model selection to build a such a model. In such cases, you might want to consider a large number of possible model effects, even though you know that a successful model that generalizes well for predicting unseen data depends on a relatively small number of effects. In such cases, you can dramatically reduce the computational task by including screening in the model selection process. The following statements show how you do this:


 proc hpreg data=ex4Data;
     class c: ;
     model y = x: c: ;
     selection method=forward screen(details=all)=100 20;
     performance details;
 run;

The ordered pair of integers that is specified in the SCREEN option in the SELECTION statement requests that screening be used to reduce the set of regressors to 100 regressors at the first screening stage and to 20 regressors at the second screening stage. This information is reflected in the Screening Information table shown in Output 12.4.1.

Output 12.4.1: Screening Information

The HPREG Procedure

Screening Information
Screening Stages Multiple
Screening Criterion Maximum Absolute Correlation
Stage 1 Number of Screened Effects 100
Stage 2 Number of Screened Effects 20


The Number Of Observations table in Output 12.4.2 confirms that the data contain 50,000 observations and the Dimensions table shows that the selection is from 1,033 effects that have a total of 2,295 parameters.

Output 12.4.2: Number of Observations and Dimensions

Number of Observations Read 50000
Number of Observations Used 50000

Dimensions
Number of Effects 1033
Number of Parameters 2295


Because you specified the DETAILS=ALL suboption of the SCREEN option, you obtain the Screening table in Output 12.4.3, which shows how the screened subset of 100 effects is obtained at the first screening stage. For display purposes, some ranks in this table have been suppressed.

Output 12.4.3: First Stage Screening Details

Effect Screening for Response
Rank Effect Maximum Absolute
Correlation
1 xIn25 0.31785
2 xIn24 0.30697
3 xIn23 0.29734
. . .
. . .
98 xOut338 0.00932
99 cOut363 0.00922
100 cOut194 0.00920
101 xOut125 0.00919*
102 xOut220 0.00916*
103 cOut310 0.00916*
104 cOut49 0.00915*
105 cOut11 0.00915*


The Screened Effects table shown in Output 12.4.4 lists the effects from which a model is selected at the first screening stage.

Output 12.4.4: First Stage Screened Effects

Screened Effects: xIn25 xIn24 xIn23 xIn22 xIn21 xIn20 xIn19 xIn18 xIn17 xIn16 xIn15 xIn14 xIn13 cIn5 xIn12 cIn3 xIn11 xIn10 xIn8 xIn9 xIn7 cIn4 cIn2 xIn6 xIn5 xIn4 xIn3 cIn1 xIn2 cOut498 cOut110 cOut450 cOut441 cOut272 xOut82 cOut45 cOut6 cOut281 cOut134 cOut15 xOut310 xOut252 xOut485 xOut365 cOut138 cOut123 cOut337 cOut195 cOut423 cOut283 cOut62 cOut114 xOut489 cOut14 cOut158 cOut437 xOut64 cOut301 cOut311 cOut187 cOut431 cOut464 cOut388 cOut213 cOut46 xOut329 cOut403 cOut305 cOut171 cOut85 cOut99 cOut249 xOut267 cOut455 cOut457 cOut271 cOut78 xOut93 cOut259 cOut417 cOut258 cOut326 cOut291 cOut263 cOut107 cOut402 cOut17 cOut237 cOut129 cOut198 cOut58 cOut428 cOut135 cOut206 cOut139 cOut113 cOut486 xOut338 cOut363 cOut194


You see that the magnitude of the pairwise correlations of effects xIn1, xWeakIn1, and xWeakIn2 with response are too small for those effects to be included as candidates for selection at the first screening stage.

The first stage continues with forward selection from the screened effects that are shown in Output 12.4.4. The effects in the selected model at this stage are shown in Output 12.4.5.

Output 12.4.5: First Stage Screened Effects

Selected Effects: Intercept xIn2 xIn3 xIn4 xIn5 xIn6 xIn7 xIn8 xIn9 xIn10 xIn11 xIn12 xIn13 xIn14 xIn15 xIn16 xIn17 xIn18 xIn19 xIn20 xIn21 xIn22 xIn23 xIn24 xIn25 cIn1 cIn2 cIn3 cIn4 cIn5


You see that the selected model at this stage includes only effects that are systematically related to the response. If you had requested that only a single-stage screening method be used by specifying the SINGLESTAGE suboption of the SCREEN option, then the selected model at this stage would have been the final selected model. However, multistage screening is used in this example. The second stage repeats the steps of the first stage except that the modeled response is the residuals from the selected model at the first stage.

Output 12.4.6 shows the screening details at the second stage. You see that 20 effects are chosen by screening at this stage as specified. Because the selected effects from the first stage are orthogonal to the residuals at the first stage, none of these effects are in the screened subset. Furthermore, you see that although the effects xIn1, xWeakIn1, and xWeakIn2 are weakly correlated with y, they are the most strongly correlated effects with the residuals from the first stage.

Output 12.4.6: Second Stage Screening Details

Screening Stage 2: Residual Fit

Effect Screening for Stage 1 Residuals
Rank Effect Maximum Absolute
Correlation
1 xIn1 0.27373
2 xWeakIn1 0.02352
3 xWeakin2 0.02132
4 cOut295 0.01524
5 cOut35 0.01443
6 cOut323 0.01417
7 cOut202 0.01406
8 xOut6 0.01401
9 cOut154 0.01263
10 cOut54 0.01160
11 cOut181 0.01159
12 cOut115 0.01150
13 cOut403 0.01144
14 xOut332 0.01142
15 xOut409 0.01141
16 cOut267 0.01137
17 cOut374 0.01132
18 cOut254 0.01128
19 xOut204 0.01121
20 cOut147 0.01120
21 xOut113 0.01116*
22 xOut427 0.01115*
23 cOut259 0.01111*
24 cOut170 0.01106*
25 cOut107 0.01102*

* Screened Out


Screened Effects: xIn1 xWeakIn1 xWeakin2 cOut295 cOut35 cOut323 cOut202 xOut6 cOut154 cOut54 cOut181 cOut115 cOut403 xOut332 xOut409 cOut267 cOut374 cOut254 xOut204 cOut147


Output 12.4.7 shows the selected effects at the second screening stage. You see that the selected effects are precisely the remaining effects that are systematically predictive of y but that were not in the screened subset at the first screening stage.

Output 12.4.7: Second Stage Selected Effects

Selected Effects: Intercept xIn1 xOut6 xWeakIn1 xWeakin2


In the third and final screening stage, model selection is performed from the union of the screened effects from the first stage (which are shown in Output 12.4.4) and the selected effects from the second stage (which are shown in Output 12.4.7). The selected effects from this final stage are shown in Output 12.4.8.

Output 12.4.8: Final Stage Selected Effects

Selected Effects: Intercept xIn1 xIn2 xIn3 xIn4 xIn5 xIn6 xIn7 xIn8 xIn9 xIn10 xIn11 xIn12 xIn13 xIn14 xIn15 xIn16 xIn17 xIn18 xIn19 xIn20 xIn21 xIn22 xIn23 xIn24 xIn25 xOut6 xWeakIn1 xWeakin2 cIn1 cIn2 cIn3 cIn4 cIn5


You see that the final selected model contains all the true underlying model effects and just one noise effect (xOut6). Because you specified the DETAILS option in the PERFORMANCE statement, the Timing table shown in Output 12.4.9 is displayed.

Output 12.4.9: Timing for Model Selection with Screening

Procedure Task Timing
Task Seconds Percent
Reading and Levelizing Data 3.65 18.15%
Loading Design Matrix 3.59 17.84%
Computing Moments 2.82 14.04%
Computing Cross Products Matrix 3.40 16.91%
Performing Model Selection 6.64 33.05%


You see that even though the selected model was obtained by selecting from thousands of effects, screening enabled the entire modeling task to be completed in about 20 seconds. You can perform the same model selection without screening as shown in the following statements:


 proc hpreg data=ex4Data;
     class c: ;
     model y = x: c: ;
     selection method=forward;
     performance details;
 run;

In this case, the model that is selected without screening is identical to model that is obtained with screening. However, there is no guarantee that you will get identical selected models. Output 12.4.10 shows the Timing table for the model selection without screening.

Output 12.4.10: Timing for Model Selection without Screening

Procedure Task Timing
Task Seconds Percent
Reading and Levelizing Data 3.42 1.18%
Loading Design Matrix 0.83 0.29%
Computing Moments 0.47 0.16%
Computing Cross Products Matrix 110.18 38.09%
Performing Model Selection 174.41 60.29%


You see that the model selection without screening took about 290 seconds, which is substantially slower than the approximately 20 seconds it took when screening was included in the selection process.