The main features of the GLMSELECT procedure are as follows:
Model Specification
supports different parameterizations for classification effects
supports any degree of interaction (crossed effects) and nested effects
supports hierarchy among effects
supports partitioning of data into training, validation, and testing roles
supports constructed effects including spline and multimember effects
Selection Control
provides multiple effect selection methods
enables selection from a very large number of effects (tens of thousands)
offers selection of individual levels of classification effects
provides effect selection based on a variety of selection criteria
provides stopping rules based on a variety of model evaluation criteria
provides leave-one-out, k-fold cross validation, and k-fold external cross validation
supports data resampling and model averaging
Display and Output
produces graphical representation of selection process
produces output data sets containing predicted values and residuals
produces an output data set containing the design matrix
produces macro variables containing selected models
supports parallel processing of BY groups
supports multiple SCORE statements
The GLMSELECT procedure supports the following effect selection methods. For more information about these methods, see the section Model-Selection Methods.
starts with no effects in the model and adds effects.
starts with all effects in the model and deletes effects.
is similar to forward selection except that effects already in the model do not necessarily stay there.
is similar to forward selection in that it starts with no effects in the model and adds effects. The parameter estimates at any step are "shrunk" when compared to the corresponding least squares estimates.
adds and deletes parameters based on a version of ordinary least squares where the sum of the absolute regression coefficients is constrained.
is an extension of LASSO that estimates parameters based on a version of ordinary least squares in which both the sum of the absolute regression coefficients and the sum of the squared regression coefficients are constrained.
is a variant of LASSO that estimates parameters based on a version of ordinary least squares in which the sum of the Euclidean norms of a group of regression coefficients is constrained.
PROC GLMSELECT also supports hybrid versions of the LAR and LASSO methods. They use LAR and LASSO to select the model but then estimate the regression coefficients by ordinary weighted least squares.
The GLMSELECT procedure is intended primarily as a model selection procedure and does not include regression diagnostics or other postselection facilities such as hypothesis testing, testing of contrasts, and LS-means analyses. The intention is that you use PROC GLMSELECT to select a model or a set of candidate models. Further investigation of these models can be done by using these models in existing regression procedures.