The GLMSELECT Procedure

Group LASSO Selection (GROUPLASSO)

The group LASSO method proposed by Yuan and Lin (2006) is a variant of LASSO that is specifically designed for linear models defined in terms of effects that have multiple degrees of freedom, such as the main effects of CLASS variables, interactions between CLASS variables, and effects defined using an EFFECT statement.

Recall that LASSO selection depends on solving a constrained least squares problem of the form

\[ \min ||\mb{y}-\bX \bbeta ||^2 \qquad \mbox{subject to} \quad \sum _{j=1}^{m} | \beta _ j | \leq t \]

In this formulation, you can include or exclude individual parameters from the model independently, subject only to the overall constraint. In contrast, the group LASSO method uses a constraint that forces all parameters that correspond to the same effect to be included or excluded simultaneously. For a model that has k effects, let $\beta _{G_ j}$ be the group of linear coefficients that correspond to effect j in the model. Then group LASSO depends on solving a constrained optimization problem of the form

\[ \min ||\mb{y}-\bX \bbeta ||^2 \qquad \mbox{subject to} \quad \sum _{j=1}^{k} \sqrt {|G_ j|} ||\beta _{G_ j}|| \leq t \]

where $|G_ j|$ is the number of parameters that correspond to effect j, and $||\beta _{G_ j}||$ denotes the Euclidean norm of the parameters $\beta _{G_ j}$. That is, instead of constraining the sum of the absolute value of individual parameters, group LASSO constrains the Euclidean norm of groups of parameters, where groups are defined by effects.

You can write the group LASSO method in the equivalent Lagrangian form

\[ \min ||\mb{y}-\bX \bbeta ||^2 + \lambda \sum _{j=1}^{k} \sqrt {|G_ j|} ||\beta _{G_ j} || \]

The weight $\sqrt {|G_ j|}$, as suggested by Yuan and Lin (2006), should take the size of the group into consideration in group LASSO.

Unlike LASSO, group LASSO does not allow a piecewise linear constant solution path as generated by a LAR algorithm. Instead, the method that Nesterov (2013) proposes is adopted to solve the Lagrangian form of the group LASSO problem that corresponds to a prespecified regularization parameter, $\lambda $. Nesterov’s method is known to have an optimal convergence rate for first-order black box optimization. Because the optimal $\lambda $ is usually unknown, a sequence of regularization parameters, $\rho , \rho ^2, \rho ^3, \ldots ,$ is employed, where $\rho $ is a positive value less than 1. You can specify $\rho $ by using the RHO= suboption of the SELECTION= option in the MODEL statement; by default, RHO=0.9. In the ith step of group LASSO selection, the value used for $\lambda $ is $\rho ^ i$. If you want the solution that corresponds to a prespecified $\lambda $, you can specify the value of $\lambda $ by using the L1= option together with STOP=L1.

Another unique feature of the group LASSO method is that it does not necessarily add or remove precisely one effect at each step of the process. This is different from the forward, stepwise, backward, LAR, LASSO, and elastic net selection methods.