The GLMSELECT Procedure

Example 49.8 Group LASSO Selection

This example shows how you can use the group LASSO method for model selection. This example treats the parameters that correspond to the same spline and CLASS variable as a group and also uses a collection effect to group otherwise unrelated parameters. The following DATA step generates the data:

%macro makeRegressorData(data=,nObs=500,nCont=5,nClass=5,nLev=3);
   data &data;
      drop i j;
      %if &nCont>0  %then %do; array x{&nCont}  x1-x&nCont; %end;
      %if &nClass>0 %then %do; array c{&nClass} c1-c&nClass;%end;
      do i = 1 to &nObs;
         %if &nCont>0 %then %do;
            do j= 1 to &nCont;
               x{j} = rannor(1);
            end;
         %end;
         %if &nClass > 0 %then %do;
            do j=1 to &nClass;
               if      mod(j,3) = 0 then c{j} = ranbin(1,&nLev,.6);
               else if mod(j,3) = 1 then c{j} = ranbin(1,&nLev,.5);
               else if mod(j,3) = 2 then c{j} = ranbin(1,&nLev,.4);
            end;
         %end;
         output;
      end;
   run;
%mend;
%makeRegressorData(data=traindata,nObs=500,nCont=5,nClass=5,nLev=3);

In the generated data, there are five continuous effects, x1–x5, and five CLASS variables, c1–c5. The following DATA step generates a response, y, that is dependent on x1–x4 and c1:

%macro AddDepVar(data=,modelRHS =,errorStd = 1);
   data &data;
      set &data;
      y = &modelRHS + &errorStd * rannor(1);
   run;
%mend;

%AddDepVar(data    = traindata, 
           modelRHS= x1 + 
                     0.1*x2 - 0.1*x3 - 0.01* x4 -
                     c1, 
           errorStd= 1);

The effects x1 and c1 have a stronger influence on the response y than x2 and x3, and x4 has the weakest influence. For the continuous effect x1, a spline effect is constructed using the cubic B-spline basis (see the section EFFECT Statement in Chapter 19: Shared Concepts and Topics) as follows:

effect s1=spline(x1)

Models that contain spline effects are particularly good candidates for group LASSO selection, because the individual parameters in a spline effect have no meaning by themselves. You need them all to be in the model together in order to define the correct spline term in the model.

To have a basis for comparison, first use the following statements to apply LASSO to model selection:

ods graphics on;

proc glmselect data=traindata plots=coefficients;
  class c1-c5/split;
  effect s1=spline(x1/split);
  model y = s1 x2-x5 c:/
        selection=lasso(steps=20 choose=sbc);
run;

In LASSO selection, effects that have multiple parameters are split into their constituent parameters. In this problem, the spline effect has seven parameters by default and each of the five CLASS effects has four parameters. So together with the intercept and four continuous main effects, your model has a total of 32 parameters, as shown in the "Dimensions" table in Output 49.8.1. Note that the number of effects in the table is 17, because the spline effect s1 is counted as 7 effects.

Output 49.8.1: Dimensions

The GLMSELECT Procedure

Dimensions
Number of Effects 17
Number of Effects after Splits 32
Number of Parameters 32



Output 49.8.2 shows the "LASSO Selection Summary" table, and Output 49.8.3 shows the standardized coefficients of all the effects selected at some step of the LASSO method, plotted as a function of the step number. Because you specified CHOOSE=SBC to pick the best model, the SBC value for the model at each step is also displayed in Output 49.8.3, with the minimum occurring at step 12. Hence the model at this step is selected, resulting in 13 selected effects. The "Parameter Estimates" table in Output 49.8.4 shows that x2 and x3 as well as parts of the spline effect and parts of three CLASS effects are selected for the final model. So this selection misses the small true effect of x4 and erroneously includes c3 and c4 because of noise.

Output 49.8.2: LASSO Selection Summary Table

The GLMSELECT Procedure
 
LASSO Selection Summary
 

Step Effect
Entered
Effect
Removed
Number
Effects In
SBC
0 Intercept   1 491.1109
1 s1:5   2 410.2008
2 s1:2   3 386.7278
3 c1_3   4 388.1524
4 s1:6   5 385.0045
5 c1_0   6 216.9169
6 c1_2   7 84.3459
7 x2   8 82.8547
8 s1:3   9 52.3145
9 c4_0   10 46.8144
10 c3_1   11 46.6228
11 x3   12 46.2866
12 c3_2   13 42.7335*
13 c5_3   14 42.8659
14 c2_0   15 47.1151
15 s1:7   16 52.5826
16 x5   17 58.3346
17 c3_0   18 63.8812
18 c2_3   19 69.4954
19 x4   20 75.6385
20 s1:1   21 79.8007
* Optimal Value of Criterion



Output 49.8.3: LASSO Coefficient Progression Plot

LASSO Coefficient Progression Plot


Output 49.8.4: LASSO Parameter Estimates

Parameter Estimates
Parameter DF Estimate
Intercept 1 -1.636852
s1:2 1 -3.829173
s1:3 1 -0.192499
s1:5 1 2.259207
s1:6 1 3.040642
x2 1 0.098525
x3 1 -0.033892
c1_0 1 0.852990
c1_2 1 -0.635575
c1_3 1 -1.643039
c3_1 1 0.070937
c3_2 1 -0.022502
c4_0 1 -0.149375



As opposed to what happens in LASSO selection, spline effects and classification effects are not split by default in group LASSO selection. In addition, you can use a collection effect to construct a group of three of the continuous effects, as shown in the following statements:

proc glmselect data=traindata plots=coefficients;
  class c1-c5;
  effect s1=spline(x1);
  effect s2=collection(x2 x3 x4);
  model y = s1 s2 x5 c:/
        selection=grouplasso(steps=20 choose=sbc rho=0.8);
run;

Because spline and classification effects are not split by default in group LASSO selection, this model contains nine effects: s1, s2, x5, c1–c5, and the intercept. The number of parameters is the same as before, as shown in Output 49.8.5.

Output 49.8.5: Dimensions

The GLMSELECT Procedure

Dimensions
Number of Effects 9
Number of Parameters 32



The RHO=0.8 option specifies the value of $\rho $ for determining the regularization parameter $\lambda =\rho ^ i$ used in the ith step of the group LASSO selection process. If you use a larger value for $\rho $, you can expect the plot shown in Output 49.8.7 to exhibit a finer coefficient progression.

Output 49.8.6 shows the "Group LASSO Selection Summary" table, and Output 49.8.7 shows the standardized coefficients of all the effects selected at some step of the group LASSO method, plotted as a function of the step number. As you can see in this plot, the CHOOSE=SBC option selects the model at step 10 as the one that has the minimum value of the SBC statistic. The resulting model contains the four effects, including all the true ones, and 15 parameters, as shown in the "Parameter Estimates" table in Output 49.8.8.

Unlike LASSO, which adds or drops an effect at each step, the group LASSO method can add or drop more than one effect. Indeed, you can see in Output 49.8.7 that steps 11 and 16 each added two effects to the model. Simple selection breaks down because group LASSO does not admit a piecewise linear constant solution path, as regular LASSO does.

Output 49.8.6: Group LASSO Selection Summary Table

The GLMSELECT Procedure
 
Group LASSO Selection Summary
 

Step Effect
Entered
Effect
Removed
Number
Effects In
SBC
0 Intercept   1 491.1109
1 s1   2 457.0038
2 c1   3 400.1165
3     3 302.8975
4     3 228.2883
5     3 173.3913
6     3 134.3346
7     3 107.2038
8     3 88.6178
9 s2   4 91.9567
10     4 79.7022*
11 c3 c4   6 119.8826
12     6 111.5926
13     6 105.8915
14     6 101.9617
15 c5   7 123.6787
16 x5 c2   9 152.2387
17     9 150.4210
18     9 149.1847
19     9 148.3496
20     9 147.7893
* Optimal Value of Criterion



Output 49.8.7: LASSO Coefficient Progression Plot

LASSO Coefficient Progression Plot


Output 49.8.8: Group LASSO Parameter Estimates

Parameter Estimates
Parameter Estimate
Intercept -1.439483
s1 1 -5.320998
s1 2 -3.058403
s1 3 -1.090183
s1 4 -0.240067
s1 5 1.458828
s1 6 2.404019
s1 7 3.023258
s2 x2 0.040036
s2 x3 -0.022261
s2 x4 -0.005458
c1 0 1.075888
c1 1 0.326289
c1 2 -0.366125
c1 3 -1.240907