Features

The main features of the GLMSELECT procedure are as follows:

  • Model Specification

    • supports different parameterizations for classification effects

    • supports any degree of interaction (crossed effects) and nested effects

    • supports hierarchy among effects

    • supports partitioning of data into training, validation, and testing roles

    • supports constructed effects including spline and multimember effects


  • Selection Control

    • provides multiple effect selection methods

    • enables selection from a very large number of effects (tens of thousands)

    • offers selection of individual levels of classification effects

    • provides effect selection based on a variety of selection criteria

    • provides stopping rules based on a variety of model evaluation criteria

    • provides leave-one-out and -fold cross validation

    • supports data resampling and model averaging

  • Display and Output

    • produces graphical representation of selection process

    • produces output data sets containing predicted values and residuals

    • produces an output data set containing the design matrix

    • produces macro variables containing selected models

    • supports parallel processing of BY groups

    • supports multiple SCORE statements

The GLMSELECT procedure supports the following effect selection methods. These methods are explained in detail in the section Model-Selection Methods.

FORWARD

Forward selection. This method starts with no effects in the model and adds effects.

BACKWARD

Backward elimination. This method starts with all effects in the model and deletes effects.

STEPWISE

Stepwise regression. This is similar to the FORWARD method except that effects already in the model do not necessarily stay there.

LAR

Least angle regression. This method, like forward selection, starts with no effects in the model and adds effects. The parameter estimates at any step are "shrunk" when compared to the corresponding least squares estimates.

LASSO

This method adds and deletes parameters based on a version of ordinary least squares where the sum of the absolute regression coefficients is constrained.

Hybrid versions of LAR and LASSO are also supported. They use LAR or LASSO to select the model, but then estimate the regression coefficients by ordinary weighted least squares.

The GLMSELECT procedure is intended primarily as a model selection procedure and does not include regression diagnostics or other postselection facilities such as hypothesis testing, testing of contrasts, and LS-means analyses. The intention is that you use PROC GLMSELECT to select a model or a set of candidate models. Further investigation of these models can be done by using these models in existing regression procedures.