SAS/STAT Software


The GLMSELECT procedure performs effect selection in the framework of general linear models. A variety of model selection methods are available, including the LASSO method of Tibshirani (1996) and the related LAR method of Efron et al. (2004). The procedure offers extensive capabilities for customizing the selection with a wide variety of selection and stopping criteria, from traditional and computationally efficient significance-level-based criteria to more computationally intensive validation-based criteria. The procedure also provides graphical summaries of the selection search. The following are highlights of the GLMSELECT procedure's features:

Model Specification

  • supports different parameterizations for classification effects
  • supports any degree of interaction (crossed effects) and nested effects
  • supports hierarchy among effects
  • supports partitioning of data into training, validation, and testing roles
  • supports constructed effects including spline and multimember effects

Display and Output

  • produces graphical representation of the selection process
  • produces output data sets that contain predicted values and residuals
  • produces an output data set that contains the design matrix
  • produces macro variables that contain selected models
  • supports parallel processing of BY groups
  • supports multiple SCORE statements

Selection Control

  • provides multiple effect selection methods including the following:
    • forward selection
    • backward elimination
    • stepwise regression
    • least angle regression (LAR)
    • least absolute shrinkage and selection operator (LASSO)
    • group LASSO
    • elastic net
    • hybrid versions of LAR and LASSO
  • enables selection from a very large number of effects (tens of thousands)
  • supports safe screening and sure independence screening methods
  • offers selection of individual levels of classification effects
  • provides effect selection based on a variety of selection criteria
  • provides stopping rules based on a variety of model evaluation criteria
  • provides leave-one-out and k-fold cross validation
  • supports data resampling and model averaging

For further details see the GLMSELECT Procedure