The GLMSELECT Procedure

Example 47.6 Elastic Net and External Cross Validation

This example shows how to use the elastic net method for model selection and compares it with the LASSO method. The example also uses k-fold external cross validation as a criterion in the CHOOSE= option to choose the best model based on the penalized regression fit.

This example uses a microarray data set called the leukemia (LEU) data set, which is used in the paper by Zou and Hastie (2005) to demonstrate the performance of the elastic net method in comparison with that of LASSO. The LEU data set consists of 7,129 genes and 72 samples, and 38 samples are used as training samples. Among the 38 training samples, 27 are type 1 leukemia (acute lymphoblastic leukemia) and 11 are type 2 leukemia (acute myeloid leukemia). The goal is to construct a diagnostic rule based on the expression level of those 7,219 genes to predict the type of leukemia. The remaining 34 samples are used as the validation data for picking the appropriate model or as the test data for testing the performance of the selected model. The training and validation data sets are available from Sashelp.Leutrain and Sashelp.Leutest.

To have a basis for comparison, use the following statements to apply the LASSO method for model selection:

ods graphics on;
proc glmselect data=sashelp.Leutrain valdata=sashelp.Leutest
               plots=coefficients;
  model y = x1-x7129/
        selection=LASSO(steps=120  choose=validate);
run;

The details about the number of observations for training and validation data are given in Output 47.6.1 and Output 47.6.2, respectively.

Output 47.6.1: Number of Training Observations Table

The GLMSELECT Procedure

Observation Profile for Analysis Data
Number of Observations Read 38
Number of Observations Used 38
Number of Observations Used for Training 38


Output 47.6.2: Number of Validation Observations Table

Observation Profile for Validation Data
Number of Observations Read 34
Number of Observations Used 34


Output 47.6.3 shows the LASSO Selection Summary table. Although the STEPS= suboption of the SELECTION= option specifies that 120 steps of LASSO selection be performed, the LASSO method terminates at step 81 because the selected model is a perfect fit and the number of effects that can be selected by LASSO is bounded by the number of training samples. Note that a VALDATA= data set is named in the PROC GLMSELECT statement and that the minimum of the validation ASE occurs at step 68. Hence the model at this step is selected, resulting in 35 selected effects. Output 47.6.4 shows the standardized coefficients of all the effects selected at some step of the LASSO method, plotted as a function of the step number.

Output 47.6.3: LASSO Selection Summary Table

The GLMSELECT Procedure
 
LASSO Selection Summary
 

Step Effect
Entered
Effect
Removed
Number
Effects In
ASE Validation
ASE
0 Intercept   1 0.8227 1.0287
1 x4847   2 0.7884 0.9787
2 x3320   3 0.7610 0.9507
3 x2020   4 0.4935 0.7061
4 x5039   5 0.2838 0.5527
5 x1249   6 0.2790 0.5518
6 x2242   7 0.2255 0.5513
7 x6539   8 0.2161 0.5498
8 x6055   9 0.2128 0.5488
9 x1779   10 0.1968 0.5205
. _ _ _ _ _
  _ _ _ _ _
  _ _ _ _ _
65   x6021 34 0.0012 0.1534
66   x1745 33 0.0012 0.1535
67 x6376   34 0.0012 0.1535
68 x4831   35 0.0010 0.1531*
69   x6184 34 0.0009 0.1539
70 x3820   35 0.0006 0.1588
71 x3171   36 0.0005 0.1615
72   x6376 35 0.0004 0.1640
73 x5142   36 0.0004 0.1659
74 x1745   37 0.0003 0.1688
75 x3605   38 0.0001 0.1821
76   x573 37 0.0001 0.1825
77 x3221   38 0.0001 0.1842
78   x1745 37 0.0001 0.1842
79 x6163   38 0.0000 0.1918
80   x4831 37 0.0000 0.1920
81 x1745   38 0.0000 0.1940
* Optimal Value Of Criterion


Output 47.6.4: LASSO Coefficient Progression Plot


As discussed earlier, the number of effects that can be selected by the LASSO method is upper-bounded by the number of training samples. However, this is not a restriction for the elastic net method, which incorporates an additional ridge regression penalty. You can use the following statements to apply the elastic net method for model selection:

proc glmselect data=sashelp.Leutrain valdata=sashelp.Leutest
               plots=coefficients;
  model y = x1-x7129/
        selection=elasticnet(steps=120 L2=0.001 choose=validate);
run;

The L2= suboption of the SELECTION= option in the MODEL statement specifies the value of the ridge regression parameter. Output 47.6.5 shows the Elastic Net Selection Summary table. As in the example for LASSO, the STEPS= suboption of the SELECTION= option specifies that 120 steps of elastic net selection be performed. However, in contrast to the early termination in the LASSO method at step 81, the elastic net method runs until the specified 120 steps are performed. With the same VALDATA= data set named in the PROC GLMSELECT statement as in the LASSO example, the minimum of the validation ASE occurs at step 105, and hence the model at this step is selected, resulting in 54 selected effects. Compared with the LASSO method, the elastic net method can select more variables, and the number of selected variables is not restricted by the number of samples (38 for this example). Output 47.6.6 shows the standardized coefficients of all the effects that are selected at some step of the elastic net method, plotted as a function of the step number.

Output 47.6.5: Elastic Net Selection Summary Table

The GLMSELECT Procedure
 
Elastic Net Selection Summary
 

Step Effect
Entered
Effect
Removed
Number
Effects In
ASE Validation
ASE
0 Intercept   1 0.8227 1.0287
1 x4847   2 0.7885 0.9790
2 x3320   3 0.7611 0.9510
3 x2020   4 0.4939 0.7065
4 x5039   5 0.2845 0.5533
5 x1249   6 0.2799 0.5525
6 x2242   7 0.2259 0.5517
7 x6539   8 0.2165 0.5502
8 x6055   9 0.2130 0.5491
9 x1779   10 0.1974 0.5215
10 x3847   11 0.1067 0.3768
11 x4951   12 0.0947 0.3580
12 x1834   13 0.0887 0.3476
13 x1745   14 0.0837 0.3379
14 x6021   15 0.0811 0.3339
15 x6362   16 0.0668 0.3126
16 x2238   17 0.0604 0.3031
. _ _ _ _ _
  _ _ _ _ _
  _ _ _ _ _
100 x259   53 0.0000 0.1207
101 x1781   54 0.0000 0.1207
102   x7065 53 0.0000 0.1198
103 x4079   54 0.0000 0.1193
104   x6163 53 0.0000 0.1193
105 x6539   54 0.0000 0.1192*
106 x6071   55 0.0000 0.1192
107 x1829   56 0.0000 0.1192
108   x5598 55 0.0000 0.1192
109 x6169   56 0.0000 0.1192
110 x1529   57 0.0000 0.1194
111 x1306   58 0.0000 0.1194
112 x5954   59 0.0000 0.1202
113 x4271   60 0.0000 0.1203
114 x3312   61 0.0000 0.1207
115 x461   62 0.0000 0.1213
116 x4697   63 0.0000 0.1228
117   x259 62 0.0000 0.1231
118 x4186   63 0.0000 0.1252
119   x4271 62 0.0000 0.1258
120 x4939   63 0.0000 0.1261
* Optimal Value Of Criterion


Output 47.6.6: Elastic Net Coefficient Progression Plot


When you have a good estimate of the ridge regression parameter, you can specify the L2= suboption of the SELECTION= option in the MODEL statement. However, when you do not have a good estimate of the ridge regression parameter, you can leave the L2= option unspecified. In this case, PROC GLMSELECT estimates this regression parameter based on the criterion specified in the CHOOSE= option.

If you have a validation data set, you can use the following statements to apply the elastic net method of model selection:

proc glmselect data=sashelp.Leutrain valdata=sashelp.Leutest
               plots=coefficients;
  model y = x1-x7129/
        selection=elasticnet(steps=120 choose=validate);
run;

PROC GLMSELECT tries a series of candidate values for the ridge regression parameter, which you can control by using the L2HIGH=, L2LOW=, and L2SEARCH= options. The ridge regression parameter is set to the value that achieves the minimum validation ASE (see Figure 47.12 for an illustration). Output 47.6.7 shows the Elastic Net Selection Summary table. Compared to Output 47.6.5, where a ridge regression parameter is specified by the L2= option, this table shows that a slightly lower validation ASE is achieved as the value of the ridge regression parameter is optimized on the validation data set.

Output 47.6.7: Elastic Net Selection Summary Table

The GLMSELECT Procedure
 
Elastic Net Selection Summary
 

Step Effect
Entered
Effect
Removed
Number
Effects In
ASE Validation
ASE
0 Intercept   1 0.8227 1.0287
1 x4847   2 0.7884 0.9787
2 x3320   3 0.7610 0.9508
3 x2020   4 0.4936 0.7061
4 x5039   5 0.2839 0.5527
5 x1249   6 0.2790 0.5519
6 x2242   7 0.2256 0.5513
7 x6539   8 0.2161 0.5498
8 x6055   9 0.2128 0.5488
9 x1779   10 0.1968 0.5205
10 x3847   11 0.1065 0.3765
11 x4951   12 0.0893 0.3501
12 x1834   13 0.0868 0.3458
13 x1745   14 0.0837 0.3397
14 x6021   15 0.0809 0.3354
15 x6362   16 0.0667 0.3143
. _ _ _ _ _
  _ _ _ _ _
  _ _ _ _ _
100   x3171 45 0.0000 0.1376
101 x4461   46 0.0000 0.1348
102 x4697   47 0.0000 0.1338
103   x3605 46 0.0000 0.1330
104 x3221   47 0.0000 0.1329
105 x5002   48 0.0000 0.1320
106   x3820 47 0.0000 0.1319
107   x4697 46 0.0000 0.1315
108 x4052   47 0.0000 0.1276
109 x6184   48 0.0000 0.1267
110 x3605   49 0.0000 0.1252
111 x1975   50 0.0000 0.1241
112 x6757   51 0.0000 0.1224
113   x3221 50 0.0000 0.1206
114 x2183   51 0.0000 0.1205
115 x259   52 0.0000 0.1202
116 x157   53 0.0000 0.1201
117 x1781   54 0.0000 0.1197
118   x7065 53 0.0000 0.1194
119 x4079   54 0.0000 0.1187
120 x6539   55 0.0000 0.1186*
* Optimal Value Of Criterion


Output 47.6.8 shows the standardized coefficients of all the effects that are selected at some step of the elastic net method, plotted as a function of the step number.

Output 47.6.8: Elastic Net Coefficient Progression Plot


When validation data are not available, you can use other criteria supported by the CHOOSE= option for estimating the value of the ridge regression parameter. Criteria such as AIC, AICC, and BIC are defined for the training samples only, and they tend to set the value of the ridge regression parameter to 0 when the number of training samples is less than the number of variables. In this case, the elastic net method reduces to the LASSO method. As an alternative to AIC, AICC, and BIC, you can use either k-fold cross validation or k-fold external cross validation in the CHOOSE= option, where the criterion is computed for held-out data.

If you want to use k-fold cross validation for selecting the model and selecting the appropriate value for the ridge regression parameter, you can use the following statements to apply the elastic net method of model selection:

proc glmselect data=sashelp.Leutrain testdata=sashelp.Leutest
               plots=coefficients;
  model y = x1-x7129/
        selection=elasticnet(steps=120 choose=cv)cvmethod=split(4);
run;

Output 47.6.9: Elastic Net Selection Summary Table

The GLMSELECT Procedure
 
Elastic Net Selection Summary
 

Step Effect
Entered
Effect
Removed
Number
Effects In
ASE Test ASE CV PRESS
0 Intercept   1 0.8227 1.0287 31.4370
1 x4847   2 0.7884 0.9787 10.6957
2 x3320   3 0.7610 0.9507 10.2065
3 x2020   4 0.4935 0.7061 7.5478
4 x5039   5 0.2838 0.5527 5.7695
5 x1249   6 0.2790 0.5518 5.0187
6 x2242   7 0.2255 0.5513 5.4050
7 x6539   8 0.2161 0.5498 7.0000
8 x6055   9 0.2128 0.5488 6.0547
9 x1779   10 0.1968 0.5205 5.6886
10 x3847   11 0.1065 0.3765 2.9205
11 x4951   12 0.0889 0.3495 3.0756
12 x1834   13 0.0867 0.3457 3.3848
13 x1745   14 0.0837 0.3398 4.5090
14 x6021   15 0.0809 0.3355 6.7676
15 x6362   16 0.0667 0.3144 10.5450
16 x2238   17 0.0622 0.3078 10.5852
. _ _ _ _ _ _
  _ _ _ _ _ _
  _ _ _ _ _ _
70 x3820   35 0.0006 0.1588 21.0175
71 x3171   36 0.0005 0.1616 21.0175
72   x6376 35 0.0004 0.1640 21.0175
73 x5142   36 0.0004 0.1659 21.0175
74 x1745   37 0.0003 0.1688 21.0175
75 x3605   38 0.0001 0.1821 21.0175
76   x573 37 0.0001 0.1824 24.0303
77 x3221   38 0.0001 0.1841 24.0303
78   x1745 37 0.0001 0.1841 24.0303
79 x6163   38 0.0000 0.1918 24.0303
80   x4831 37 0.0000 0.1919 24.0303
81 x1745   38 0.0000 0.1933 24.0303
82 x4831   39 0.0000 0.1937 24.0303
83 x2661   40 0.0000 0.1864 24.0303
84   x3605 39 0.0000 0.1857 24.0303
85   x464 38 0.0000 0.1857 7.2724
86 x5766   39 0.0000 0.1854 7.2724
87 x464   40 0.0000 0.1763 7.2724
88 x6376   41 0.0000 0.1735 7.2724
89   x464 40 0.0000 0.1675 7.2724
90   x4079 39 0.0000 0.1674 1.2851
91 x1987   40 0.0000 0.1625 1.2846*
* Optimal Value Of Criterion


A TESTDATA= data set is named in the PROC GLMSELECT statement; this is the same as the validation data set used earlier. Because the L2= option is not specified, PROC GLMSELECT tries a series of candidate values for the ridge regression parameter according to the CHOOSE=CV option, and the ridge regression parameter is set to the value that achieves the minimal CVPRESS score. Output 47.6.9 shows the Elastic Net Selection Summary table, which corresponds to the ridge regression parameter selected by k-fold cross validation. The elastic net method achieves the smallest CVPRESS score at step 91. Hence the model at this step is selected, resulting in 40 selected effects.

Output 47.6.10 shows the standardized coefficients of all the effects selected at some step of the elastic net method, plotted as a function of the step number. Note that k-fold cross validation uses a least squares fit to compute the CVPRESS score. Thus the criterion does not directly depend on the penalized regression used in the elastic net method, and the CVPRESS curve in Output 47.6.10 has abrupt jumps.

Output 47.6.10: Elastic Net Coefficient Progression Plot


If you want to perform the selection of both the ridge regression parameter and the selected variables in the model based on a criterion that directly depends on the regression used in the elastic net method, you can use k-fold external cross validation by specifying the following statements:

proc glmselect data=sashelp.Leutrain testdata=sashelp.Leutest
               plots=coefficients;
  model y = x1-x7129/
        selection=elasticnet(steps=120 choose=cvex)cvmethod=split(4);
run;

Output 47.6.11: Elastic Net Coefficient Progression Plot


As earlier, a TESTDATA= data set is named in the PROC GLMSELECT statement to test the prediction accuracy of the model on the training samples. Because the L2= option is not specified, PROC GLMSELECT tries a series of candidate values for the ridge regression parameter. Output 47.6.11 shows the standardized coefficients of all the effects selected at some step of the elastic net method, plotted as a function of the step number, and also the curve of the CVEXPRESS statistic as a function of the step number. A comparison between the criterion curves in Output 47.6.10 and Output 47.6.11 shows that the CVEXPRESS statistic is smoother than the CVPRESS statistic. Note that the CVEXPRESS statistic is based on a penalized model, whereas the CVPRESS statistic is based on an ordinary least squares model. Output 47.6.12 shows the Elastic Net Selection Summary table, which corresponds to the ridge regression parameter selected by k-fold external cross validation. The elastic net method achieves the smallest CVEXPRESS score at step 120, and hence the model at this step is selected, resulting in 53 selected effects.

Output 47.6.12: Elastic Net Selection Summary Table

The GLMSELECT Procedure
 
Elastic Net Selection Summary
 

Step Effect
Entered
Effect
Removed
Number
Effects In
ASE Test ASE CVEX PRESS
0 Intercept   1 0.8227 1.0287 0.8265
1 x4847   2 0.7884 0.9787 0.7940
2 x3320   3 0.7610 0.9507 0.7698
3 x2020   4 0.4936 0.7061 0.5195
4 x5039   5 0.2839 0.5527 0.3554
5 x1249   6 0.2790 0.5519 0.3526
6 x2242   7 0.2255 0.5513 0.3218
7 x6539   8 0.2161 0.5498 0.3166
8 x6055   9 0.2128 0.5488 0.3148
9 x1779   10 0.1968 0.5205 0.3096
10 x3847   11 0.1065 0.3765 0.2671
11 x4951   12 0.0891 0.3498 0.2586
12 x1834   13 0.0868 0.3458 0.2579
13 x1745   14 0.0837 0.3397 0.2570
14 x6021   15 0.0809 0.3355 0.2561
15 x6362   16 0.0667 0.3144 0.2510
16 x2238   17 0.0621 0.3075 0.2498
. _ _ _ _ _ _
  _ _ _ _ _ _
  _ _ _ _ _ _
100 x2121   45 0.0000 0.1402 0.2350
101 x6021   46 0.0000 0.1386 0.2334
102   x3171 45 0.0000 0.1377 0.2324
103 x4461   46 0.0000 0.1350 0.2302
104 x4697   47 0.0000 0.1338 0.2299
105   x3605 46 0.0000 0.1331 0.2293
106 x3221   47 0.0000 0.1330 0.2289
107 x5002   48 0.0000 0.1320 0.2282
108   x3820 47 0.0000 0.1319 0.2281
109   x4697 46 0.0000 0.1315 0.2278
110 x4052   47 0.0000 0.1276 0.2235
111 x6184   48 0.0000 0.1266 0.2229
112 x3605   49 0.0000 0.1252 0.2215
113 x1975   50 0.0000 0.1240 0.2212
114 x6757   51 0.0000 0.1222 0.2200
115   x3221 50 0.0000 0.1206 0.2185
116 x2183   51 0.0000 0.1205 0.2185
117 x259   52 0.0000 0.1202 0.2182
118 x157   53 0.0000 0.1201 0.2181
119 x1781   54 0.0000 0.1197 0.2177
120   x7065 53 0.0000 0.1194 0.2177*
* Optimal Value Of Criterion