The HPREG Procedure

Example 12.4 Forward Selection with Screening

This example shows how you can use the SCREEN option in the SELECTION statement to greatly speed up model selection from a large number of regressors. In order to demonstrate the efficacy of model selection with screening, this example uses simulated data in which the response y depends systematically on a relatively small subset of a much larger set of regressors, which is described in Table 12.8.

Table 12.8: Complete Set of Regressors

Regressor Name	Type	Number of Levels	In True Model
`xIn1–xIn25`	Continuous		Yes
`xWeakIn1–xWeakIn2`	Continuous		Yes
`xOut1–xOut500`	Continuous		No
`cIn1–cIn5`	Classification	From two to five	Yes
`cOut1–cOut500`	Classification	From two to five	No

The labels In and Out, which are part of the variable names, make it easy to identify whether the selected model succeeds or fails in capturing the true underlying model. The regressors that are labeled xWeakIn1 and xWeakIn2 are predictive, but their influence is substantially smaller than the influence of the other regressors in the true model.

The following DATA step generates the data:

  %let nObs      = 50000;
  %let nContIn   = 25;
  %let nContOut  = 500;
  %let nClassIn  = 5;
  %let nClassOut = 500;
  %let maxLevs   = 5;
  %let noiseScale= 1;  

   data ex4Data;
     array xIn{&nContIn}; 
     array xOut{&nContOut}; 
     array cIn{&nClassIn}; 
     array cOut{&nClassOut};

     drop i j sign nLevs xBeta;
 
     do i=1 to &nObs;  
        sign  = -1;
        xBeta = 0;
        do j=1 to dim(xIn); 
           xIn{j} = ranuni(1);
           xBeta  = xBeta + j*sign*xIn{j};
           sign   = -sign; 
        end; 
        do j=1 to dim(xOut); 
           xOut{j} = ranuni(1); 
        end;

        xWeakIn1 = ranuni(1);
        xWeakin2 = ranuni(1);

        xBeta  = xBeta + 0.1*xWeakIn1+ 0.1*xWeakIn2;

        do j=1 to dim(cIn); 
           nLevs  = 2 + mod(j,&maxlevs-1);
           cIn{j} = 1+int(ranuni(1)*nLevs);
           xBeta  = xBeta + j*sign*(cIn{j}-nLevs/2); 
           sign   = -sign;  
        end; 

        do j=1 to dim(cOut);
           nLevs  = 2 + mod(j,&maxlevs-1); 
           cOut{j} = 1+int(ranuni(1)*nLevs); 
        end;

        y = xBeta + &noiseScale*rannor(1);
       
        output;  
    end; 
  run;

When you have insufficient prior knowledge of what effects need to be included in a parsimonious predictive model, a reasonable starting point is to use model selection to build a such a model. In such cases, you might want to consider a large number of possible model effects, even though you know that a successful model that generalizes well for predicting unseen data depends on a relatively small number of effects. In such cases, you can dramatically reduce the computational task by including screening in the model selection process. The following statements show how you do this:


 proc hpreg data=ex4Data;
     class c: ;
     model y = x: c: ;
     selection method=forward screen(details=all)=100 20;
     performance details;
 run;

The ordered pair of integers that is specified in the SCREEN option in the SELECTION statement requests that screening be used to reduce the set of regressors to 100 regressors at the first screening stage and to 20 regressors at the second screening stage. This information is reflected in the “Screening Information” table shown in Output 12.4.1.

Output 12.4.1: Screening Information

The HPREG Procedure

Screening Information
Screening Stages	Multiple
Screening Criterion	Maximum Absolute Correlation
Stage 1 Number of Screened Effects	100
Stage 2 Number of Screened Effects	20

The “Number Of Observations” table in Output 12.4.2 confirms that the data contain 50,000 observations and the “Dimensions” table shows that the selection is from 1,033 effects that have a total of 2,295 parameters.

Output 12.4.2: Number of Observations and Dimensions

Number of Observations Read	50000
Number of Observations Used	50000

Dimensions
Number of Effects	1033
Number of Parameters	2295

Because you specified the DETAILS=ALL suboption of the SCREEN option, you obtain the “Screening” table in Output 12.4.3, which shows how the screened subset of 100 effects is obtained at the first screening stage. For display purposes, some ranks in this table have been suppressed.

Output 12.4.3: First Stage Screening Details

Effect Screening for Response
Rank	Effect	Maximum Absolute Correlation
1	xIn25	0.31785
2	xIn24	0.30697
3	xIn23	0.29734
.	.	.
.	.	.
98	xOut338	0.00932
99	cOut363	0.00922
100	cOut194	0.00920
101	xOut125	0.00919*
102	xOut220	0.00916*
103	cOut310	0.00916*
104	cOut49	0.00915*
105	cOut11	0.00915*

The “Screened Effects” table shown in Output 12.4.4 lists the effects from which a model is selected at the first screening stage.

Output 12.4.4: First Stage Screened Effects

Screened Effects:	xIn25 xIn24 xIn23 xIn22 xIn21 xIn20 xIn19 xIn18 xIn17 xIn16 xIn15 xIn14 xIn13 cIn5 xIn12 cIn3 xIn11 xIn10 xIn8 xIn9 xIn7 cIn4 cIn2 xIn6 xIn5 xIn4 xIn3 cIn1 xIn2 cOut498 cOut110 cOut450 cOut441 cOut272 xOut82 cOut45 cOut6 cOut281 cOut134 cOut15 xOut310 xOut252 xOut485 xOut365 cOut138 cOut123 cOut337 cOut195 cOut423 cOut283 cOut62 cOut114 xOut489 cOut14 cOut158 cOut437 xOut64 cOut301 cOut311 cOut187 cOut431 cOut464 cOut388 cOut213 cOut46 xOut329 cOut403 cOut305 cOut171 cOut85 cOut99 cOut249 xOut267 cOut455 cOut457 cOut271 cOut78 xOut93 cOut259 cOut417 cOut258 cOut326 cOut291 cOut263 cOut107 cOut402 cOut17 cOut237 cOut129 cOut198 cOut58 cOut428 cOut135 cOut206 cOut139 cOut113 cOut486 xOut338 cOut363 cOut194

Screened Effects:

xIn25 xIn24 xIn23 xIn22 xIn21 xIn20 xIn19 xIn18 xIn17 xIn16 xIn15 xIn14 xIn13 cIn5 xIn12 cIn3 xIn11 xIn10 xIn8 xIn9 xIn7 cIn4 cIn2 xIn6 xIn5 xIn4 xIn3 cIn1 xIn2 cOut498 cOut110 cOut450 cOut441 cOut272 xOut82 cOut45 cOut6 cOut281 cOut134 cOut15 xOut310 xOut252 xOut485 xOut365 cOut138 cOut123 cOut337 cOut195 cOut423 cOut283 cOut62 cOut114 xOut489 cOut14 cOut158 cOut437 xOut64 cOut301 cOut311 cOut187 cOut431 cOut464 cOut388 cOut213 cOut46 xOut329 cOut403 cOut305 cOut171 cOut85 cOut99 cOut249 xOut267 cOut455 cOut457 cOut271 cOut78 xOut93 cOut259 cOut417 cOut258 cOut326 cOut291 cOut263 cOut107 cOut402 cOut17 cOut237 cOut129 cOut198 cOut58 cOut428 cOut135 cOut206 cOut139 cOut113 cOut486 xOut338 cOut363 cOut194

You see that the magnitude of the pairwise correlations of effects xIn1, xWeakIn1, and xWeakIn2 with response are too small for those effects to be included as candidates for selection at the first screening stage.

The first stage continues with forward selection from the screened effects that are shown in Output 12.4.4. The effects in the selected model at this stage are shown in Output 12.4.5.

Output 12.4.5: First Stage Screened Effects

Selected Effects:	Intercept xIn2 xIn3 xIn4 xIn5 xIn6 xIn7 xIn8 xIn9 xIn10 xIn11 xIn12 xIn13 xIn14 xIn15 xIn16 xIn17 xIn18 xIn19 xIn20 xIn21 xIn22 xIn23 xIn24 xIn25 cIn1 cIn2 cIn3 cIn4 cIn5

You see that the selected model at this stage includes only effects that are systematically related to the response. If you had requested that only a single-stage screening method be used by specifying the SINGLESTAGE suboption of the SCREEN option, then the selected model at this stage would have been the final selected model. However, multistage screening is used in this example. The second stage repeats the steps of the first stage except that the modeled response is the residuals from the selected model at the first stage.

Output 12.4.6 shows the screening details at the second stage. You see that 20 effects are chosen by screening at this stage as specified. Because the selected effects from the first stage are orthogonal to the residuals at the first stage, none of these effects are in the screened subset. Furthermore, you see that although the effects xIn1, xWeakIn1, and xWeakIn2 are weakly correlated with y, they are the most strongly correlated effects with the residuals from the first stage.

Output 12.4.6: Second Stage Screening Details

Screening Stage 2: Residual Fit

Effect Screening for Stage 1 Residuals
Rank	Effect	Maximum Absolute Correlation
1	xIn1	0.27373
2	xWeakIn1	0.02352
3	xWeakin2	0.02132
4	cOut295	0.01524
5	cOut35	0.01443
6	cOut323	0.01417
7	cOut202	0.01406
8	xOut6	0.01401
9	cOut154	0.01263
10	cOut54	0.01160
11	cOut181	0.01159
12	cOut115	0.01150
13	cOut403	0.01144
14	xOut332	0.01142
15	xOut409	0.01141
16	cOut267	0.01137
17	cOut374	0.01132
18	cOut254	0.01128
19	xOut204	0.01121
20	cOut147	0.01120
21	xOut113	0.01116*
22	xOut427	0.01115*
23	cOut259	0.01111*
24	cOut170	0.01106*
25	cOut107	0.01102*

* Screened Out

Screened Effects:	xIn1 xWeakIn1 xWeakin2 cOut295 cOut35 cOut323 cOut202 xOut6 cOut154 cOut54 cOut181 cOut115 cOut403 xOut332 xOut409 cOut267 cOut374 cOut254 xOut204 cOut147

Output 12.4.7 shows the selected effects at the second screening stage. You see that the selected effects are precisely the remaining effects that are systematically predictive of y but that were not in the screened subset at the first screening stage.

Output 12.4.7: Second Stage Selected Effects

Selected Effects:	Intercept xIn1 xOut6 xWeakIn1 xWeakin2

In the third and final screening stage, model selection is performed from the union of the screened effects from the first stage (which are shown in Output 12.4.4) and the selected effects from the second stage (which are shown in Output 12.4.7). The selected effects from this final stage are shown in Output 12.4.8.

Output 12.4.8: Final Stage Selected Effects

Selected Effects:	Intercept xIn1 xIn2 xIn3 xIn4 xIn5 xIn6 xIn7 xIn8 xIn9 xIn10 xIn11 xIn12 xIn13 xIn14 xIn15 xIn16 xIn17 xIn18 xIn19 xIn20 xIn21 xIn22 xIn23 xIn24 xIn25 xOut6 xWeakIn1 xWeakin2 cIn1 cIn2 cIn3 cIn4 cIn5

You see that the final selected model contains all the true underlying model effects and just one noise effect (xOut6). Because you specified the DETAILS option in the PERFORMANCE statement, the “Timing” table shown in Output 12.4.9 is displayed.

Output 12.4.9: Timing for Model Selection with Screening

Procedure Task Timing
Task	Seconds	Percent
Reading and Levelizing Data	3.65	18.15%
Loading Design Matrix	3.59	17.84%
Computing Moments	2.82	14.04%
Computing Cross Products Matrix	3.40	16.91%
Performing Model Selection	6.64	33.05%

You see that even though the selected model was obtained by selecting from thousands of effects, screening enabled the entire modeling task to be completed in about 20 seconds. You can perform the same model selection without screening as shown in the following statements:


 proc hpreg data=ex4Data;
     class c: ;
     model y = x: c: ;
     selection method=forward;
     performance details;
 run;

In this case, the model that is selected without screening is identical to model that is obtained with screening. However, there is no guarantee that you will get identical selected models. Output 12.4.10 shows the “Timing” table for the model selection without screening.

Output 12.4.10: Timing for Model Selection without Screening

Procedure Task Timing
Task	Seconds	Percent
Reading and Levelizing Data	3.42	1.18%
Loading Design Matrix	0.83	0.29%
Computing Moments	0.47	0.16%
Computing Cross Products Matrix	110.18	38.09%
Performing Model Selection	174.41	60.29%

You see that the model selection without screening took about 290 seconds, which is substantially slower than the approximately 20 seconds it took when screening was included in the selection process.