This example shows how you can use the SCREEN option in the SELECTION statement to greatly speed up model selection from a large number of regressors. In order to demonstrate the efficacy of
model selection with screening, this example uses simulated data in which the response y
depends systematically on a relatively small subset of a much larger set of regressors, which is described in Table 12.8.
Table 12.8: Complete Set of Regressors
Regressor Name |
Type |
Number of Levels |
In True Model |
---|---|---|---|
|
Continuous |
Yes |
|
|
Continuous |
Yes |
|
|
Continuous |
No |
|
|
Classification |
From two to five |
Yes |
|
Classification |
From two to five |
No |
The labels In
and Out
, which are part of the variable names, make it easy to identify whether the selected model succeeds or fails in capturing
the true underlying model. The regressors that are labeled xWeakIn1
and xWeakIn2
are predictive, but their influence is substantially smaller than the influence of the other regressors in the true model.
The following DATA step generates the data:
%let nObs = 50000; %let nContIn = 25; %let nContOut = 500; %let nClassIn = 5; %let nClassOut = 500; %let maxLevs = 5; %let noiseScale= 1; data ex4Data; array xIn{&nContIn}; array xOut{&nContOut}; array cIn{&nClassIn}; array cOut{&nClassOut}; drop i j sign nLevs xBeta; do i=1 to &nObs; sign = -1; xBeta = 0; do j=1 to dim(xIn); xIn{j} = ranuni(1); xBeta = xBeta + j*sign*xIn{j}; sign = -sign; end; do j=1 to dim(xOut); xOut{j} = ranuni(1); end; xWeakIn1 = ranuni(1); xWeakin2 = ranuni(1); xBeta = xBeta + 0.1*xWeakIn1+ 0.1*xWeakIn2; do j=1 to dim(cIn); nLevs = 2 + mod(j,&maxlevs-1); cIn{j} = 1+int(ranuni(1)*nLevs); xBeta = xBeta + j*sign*(cIn{j}-nLevs/2); sign = -sign; end; do j=1 to dim(cOut); nLevs = 2 + mod(j,&maxlevs-1); cOut{j} = 1+int(ranuni(1)*nLevs); end; y = xBeta + &noiseScale*rannor(1); output; end; run;
When you have insufficient prior knowledge of what effects need to be included in a parsimonious predictive model, a reasonable starting point is to use model selection to build a such a model. In such cases, you might want to consider a large number of possible model effects, even though you know that a successful model that generalizes well for predicting unseen data depends on a relatively small number of effects. In such cases, you can dramatically reduce the computational task by including screening in the model selection process. The following statements show how you do this:
proc hpreg data=ex4Data; class c: ; model y = x: c: ; selection method=forward screen(details=all)=100 20; performance details; run;
The ordered pair of integers that is specified in the SCREEN option in the SELECTION statement requests that screening be used to reduce the set of regressors to 100 regressors at the first screening stage and to 20 regressors at the second screening stage. This information is reflected in the “Screening Information” table shown in Output 12.4.1.
Output 12.4.1: Screening Information
Screening Information | |
---|---|
Screening Stages | Multiple |
Screening Criterion | Maximum Absolute Correlation |
Stage 1 Number of Screened Effects | 100 |
Stage 2 Number of Screened Effects | 20 |
The “Number Of Observations” table in Output 12.4.2 confirms that the data contain 50,000 observations and the “Dimensions” table shows that the selection is from 1,033 effects that have a total of 2,295 parameters.
Output 12.4.2: Number of Observations and Dimensions
Number of Observations Read | 50000 |
---|---|
Number of Observations Used | 50000 |
Dimensions | |
---|---|
Number of Effects | 1033 |
Number of Parameters | 2295 |
Because you specified the DETAILS=ALL suboption of the SCREEN option, you obtain the “Screening” table in Output 12.4.3, which shows how the screened subset of 100 effects is obtained at the first screening stage. For display purposes, some ranks in this table have been suppressed.
Output 12.4.3: First Stage Screening Details
Effect Screening for Response | ||
---|---|---|
Rank | Effect | Maximum Absolute Correlation |
1 | xIn25 | 0.31785 |
2 | xIn24 | 0.30697 |
3 | xIn23 | 0.29734 |
. | . | . |
. | . | . |
98 | xOut338 | 0.00932 |
99 | cOut363 | 0.00922 |
100 | cOut194 | 0.00920 |
101 | xOut125 | 0.00919* |
102 | xOut220 | 0.00916* |
103 | cOut310 | 0.00916* |
104 | cOut49 | 0.00915* |
105 | cOut11 | 0.00915* |
The “Screened Effects” table shown in Output 12.4.4 lists the effects from which a model is selected at the first screening stage.
Output 12.4.4: First Stage Screened Effects
Screened Effects: | xIn25 xIn24 xIn23 xIn22 xIn21 xIn20 xIn19 xIn18 xIn17 xIn16 xIn15 xIn14 xIn13 cIn5 xIn12 cIn3 xIn11 xIn10 xIn8 xIn9 xIn7 cIn4 cIn2 xIn6 xIn5 xIn4 xIn3 cIn1 xIn2 cOut498 cOut110 cOut450 cOut441 cOut272 xOut82 cOut45 cOut6 cOut281 cOut134 cOut15 xOut310 xOut252 xOut485 xOut365 cOut138 cOut123 cOut337 cOut195 cOut423 cOut283 cOut62 cOut114 xOut489 cOut14 cOut158 cOut437 xOut64 cOut301 cOut311 cOut187 cOut431 cOut464 cOut388 cOut213 cOut46 xOut329 cOut403 cOut305 cOut171 cOut85 cOut99 cOut249 xOut267 cOut455 cOut457 cOut271 cOut78 xOut93 cOut259 cOut417 cOut258 cOut326 cOut291 cOut263 cOut107 cOut402 cOut17 cOut237 cOut129 cOut198 cOut58 cOut428 cOut135 cOut206 cOut139 cOut113 cOut486 xOut338 cOut363 cOut194 |
---|
You see that the magnitude of the pairwise correlations of effects xIn1
, xWeakIn1
, and xWeakIn2
with response are too small for those effects to be included as candidates for selection at the first screening stage.
The first stage continues with forward selection from the screened effects that are shown in Output 12.4.4. The effects in the selected model at this stage are shown in Output 12.4.5.
Output 12.4.5: First Stage Screened Effects
Selected Effects: | Intercept xIn2 xIn3 xIn4 xIn5 xIn6 xIn7 xIn8 xIn9 xIn10 xIn11 xIn12 xIn13 xIn14 xIn15 xIn16 xIn17 xIn18 xIn19 xIn20 xIn21 xIn22 xIn23 xIn24 xIn25 cIn1 cIn2 cIn3 cIn4 cIn5 |
---|
You see that the selected model at this stage includes only effects that are systematically related to the response. If you had requested that only a single-stage screening method be used by specifying the SINGLESTAGE suboption of the SCREEN option, then the selected model at this stage would have been the final selected model. However, multistage screening is used in this example. The second stage repeats the steps of the first stage except that the modeled response is the residuals from the selected model at the first stage.
Output 12.4.6 shows the screening details at the second stage. You see that 20 effects are chosen by screening at this stage as specified.
Because the selected effects from the first stage are orthogonal to the residuals at the first stage, none of these effects
are in the screened subset. Furthermore, you see that although the effects xIn1
, xWeakIn1
, and xWeakIn2
are weakly correlated with y
, they are the most strongly correlated effects with the residuals from the first stage.
Output 12.4.6: Second Stage Screening Details
Effect Screening for Stage 1 Residuals | ||
---|---|---|
Rank | Effect | Maximum Absolute Correlation |
1 | xIn1 | 0.27373 |
2 | xWeakIn1 | 0.02352 |
3 | xWeakin2 | 0.02132 |
4 | cOut295 | 0.01524 |
5 | cOut35 | 0.01443 |
6 | cOut323 | 0.01417 |
7 | cOut202 | 0.01406 |
8 | xOut6 | 0.01401 |
9 | cOut154 | 0.01263 |
10 | cOut54 | 0.01160 |
11 | cOut181 | 0.01159 |
12 | cOut115 | 0.01150 |
13 | cOut403 | 0.01144 |
14 | xOut332 | 0.01142 |
15 | xOut409 | 0.01141 |
16 | cOut267 | 0.01137 |
17 | cOut374 | 0.01132 |
18 | cOut254 | 0.01128 |
19 | xOut204 | 0.01121 |
20 | cOut147 | 0.01120 |
21 | xOut113 | 0.01116* |
22 | xOut427 | 0.01115* |
23 | cOut259 | 0.01111* |
24 | cOut170 | 0.01106* |
25 | cOut107 | 0.01102* |
* Screened Out |
Screened Effects: | xIn1 xWeakIn1 xWeakin2 cOut295 cOut35 cOut323 cOut202 xOut6 cOut154 cOut54 cOut181 cOut115 cOut403 xOut332 xOut409 cOut267 cOut374 cOut254 xOut204 cOut147 |
---|
Output 12.4.7 shows the selected effects at the second screening stage. You see that the selected effects are precisely the remaining effects
that are systematically predictive of y
but that were not in the screened subset at the first screening stage.
Output 12.4.7: Second Stage Selected Effects
Selected Effects: | Intercept xIn1 xOut6 xWeakIn1 xWeakin2 |
---|
In the third and final screening stage, model selection is performed from the union of the screened effects from the first stage (which are shown in Output 12.4.4) and the selected effects from the second stage (which are shown in Output 12.4.7). The selected effects from this final stage are shown in Output 12.4.8.
Output 12.4.8: Final Stage Selected Effects
Selected Effects: | Intercept xIn1 xIn2 xIn3 xIn4 xIn5 xIn6 xIn7 xIn8 xIn9 xIn10 xIn11 xIn12 xIn13 xIn14 xIn15 xIn16 xIn17 xIn18 xIn19 xIn20 xIn21 xIn22 xIn23 xIn24 xIn25 xOut6 xWeakIn1 xWeakin2 cIn1 cIn2 cIn3 cIn4 cIn5 |
---|
You see that the final selected model contains all the true underlying model effects and just one noise effect (xOut6
). Because you specified the DETAILS option in the PERFORMANCE statement, the “Timing” table shown in Output 12.4.9 is displayed.
Output 12.4.9: Timing for Model Selection with Screening
Procedure Task Timing | ||
---|---|---|
Task | Seconds | Percent |
Reading and Levelizing Data | 3.65 | 18.15% |
Loading Design Matrix | 3.59 | 17.84% |
Computing Moments | 2.82 | 14.04% |
Computing Cross Products Matrix | 3.40 | 16.91% |
Performing Model Selection | 6.64 | 33.05% |
You see that even though the selected model was obtained by selecting from thousands of effects, screening enabled the entire modeling task to be completed in about 20 seconds. You can perform the same model selection without screening as shown in the following statements:
proc hpreg data=ex4Data; class c: ; model y = x: c: ; selection method=forward; performance details; run;
In this case, the model that is selected without screening is identical to model that is obtained with screening. However, there is no guarantee that you will get identical selected models. Output 12.4.10 shows the “Timing” table for the model selection without screening.
Output 12.4.10: Timing for Model Selection without Screening
Procedure Task Timing | ||
---|---|---|
Task | Seconds | Percent |
Reading and Levelizing Data | 3.42 | 1.18% |
Loading Design Matrix | 0.83 | 0.29% |
Computing Moments | 0.47 | 0.16% |
Computing Cross Products Matrix | 110.18 | 38.09% |
Performing Model Selection | 174.41 | 60.29% |
You see that the model selection without screening took about 290 seconds, which is substantially slower than the approximately 20 seconds it took when screening was included in the selection process.