The GLMSELECT Procedure

Features

The main features of the GLMSELECT procedure are as follows:

  • Model Specification

    • supports different parameterizations for classification effects

    • supports any degree of interaction (crossed effects) and nested effects

    • supports hierarchy among effects

    • supports partitioning of data into training, validation, and testing roles

    • supports constructed effects including spline and multimember effects

  • Selection Control

    • provides multiple effect selection methods

    • enables selection from a very large number of effects (tens of thousands)

    • offers selection of individual levels of classification effects

    • provides effect selection based on a variety of selection criteria

    • provides stopping rules based on a variety of model evaluation criteria

    • provides leave-one-out, k-fold cross validation, and k-fold external cross validation

    • supports data resampling and model averaging

  • Display and Output

    • produces graphical representation of selection process

    • produces output data sets containing predicted values and residuals

    • produces an output data set containing the design matrix

    • produces macro variables containing selected models

    • supports parallel processing of BY groups

    • supports multiple SCORE statements

The GLMSELECT procedure supports the following effect selection methods. For more information about these methods, see the section Model-Selection Methods.

forward selection

starts with no effects in the model and adds effects.

backward elimination

starts with all effects in the model and deletes effects.

stepwise regression

is similar to forward selection except that effects already in the model do not necessarily stay there.

least angle regression (LAR)

is similar to forward selection in that it starts with no effects in the model and adds effects. The parameter estimates at any step are "shrunk" when compared to the corresponding least squares estimates.

LASSO

adds and deletes parameters based on a version of ordinary least squares where the sum of the absolute regression coefficients is constrained.

elastic net

is an extension of LASSO that estimates parameters based on a version of ordinary least squares in which both the sum of the absolute regression coefficients and the sum of the squared regression coefficients are constrained.

group LASSO

is a variant of LASSO that estimates parameters based on a version of ordinary least squares in which the sum of the Euclidean norms of a group of regression coefficients is constrained.

PROC GLMSELECT also supports hybrid versions of the LAR and LASSO methods. They use LAR and LASSO to select the model but then estimate the regression coefficients by ordinary weighted least squares.

The GLMSELECT procedure is intended primarily as a model selection procedure and does not include regression diagnostics or other postselection facilities such as hypothesis testing, testing of contrasts, and LS-means analyses. The intention is that you use PROC GLMSELECT to select a model or a set of candidate models. Further investigation of these models can be done by using these models in existing regression procedures.