The HPGENSELECT Procedure

SELECTION Statement

  • SELECTION <options>;

The SELECTION statement performs model selection by examining whether effects should be added to or removed from the model according to rules that are defined by model selection methods. The statement is fully documented in the section SELECTION Statement in SAS/STAT 14.1 User's Guide: High-Performance Procedures.

The HPGENSELECT procedure supports the following effect-selection methods in the SELECTION statement:

METHOD=NONE

results in no model selection. This method fits the full model.

METHOD=BACKWARD

performs backward elimination. This method starts with all effects in the model and deletes effects.

METHOD=FORWARD

performs forward selection. This method starts with no effects in the model and adds effects.

METHOD=LASSO

performs model selection by the group LASSO method. This method adds and removes effects by using a sequence of LASSO steps.

METHOD=STEPWISE

performs stepwise regression. This method is similar to the FORWARD method except that effects already in the model do not necessarily stay there.

For methods other than LASSO, the only effect-selection criterion that the HPGENSELECT procedure supports is SELECT= SL, in which effects enter and leave the model based on an evaluation of the significance level. To determine the level of significance for each candidate effect, PROC HPGENSELECT calculates an approximate chi-square test statistic. The SELECT= option is not supported by the LASSO method.

You can specify the following criteria in the CHOOSE= option:

AIC

specifies Akaike’s information criterion (Akaike 1974).

AICC

specifies a small-sample bias-corrected version of Akaike’s information criterion as promoted in Hurvich and Tsai (1989) and Burnham and Anderson (1998), among others.

BIC | SBC

specifies the Schwarz Bayesian criterion (Schwarz 1978).

VALIDATE

specifies the Bayesian information criterion (BIC) computed from validation data, if you specify validation data by using a PARTITION statement. This option is supported only for the METHOD=LASSO selection method.

You can specify the following criteria in the STOP= option:

SL

specifies the significance level of the test.

AIC

specifies Akaike’s information criterion (Akaike 1974).

AICC

specifies a small-sample bias-corrected version of Akaike’s information criterion as promoted in Hurvich and Tsai (1989) and Burnham and Anderson (1998), among others.

BIC | SBC

specifies the Schwarz Bayesian criterion (Schwarz 1978).

If you specify METHOD=LASSO and you do not specify either the CHOOSE= or STOP= option, then the model in the last LASSO step is chosen as the selected model.

The calculation of the information criteria uses the following formulas, where p denotes the number of effective parameters in the candidate model, f denotes the number of frequencies used, and l is the log likelihood evaluated at the converged estimates:

\begin{align*} \mr{AIC} =& -2 l + 2p \\ \mr{AICC} =& \left\{ \begin{array}{ll} -2 l + 2 p f/(f-p-1) & \text {when} f > p+2 \cr -2 l + 2 p (p+2) & \text {otherwise} \end{array}\right. \\ \mr{BIC} =& -2 l + p \log (f) \end{align*}

If you specify the PARTITION statement, then the AIC, AICC, BIC, and SL statistics are computed on the training data set; otherwise they are computed on the full data set.

When you specify one of the following DETAILS= options in the SELECTION statement, the HPGENSELECT procedure produces the indicated tables:

DETAILS=SUMMARY

produces a summary table that shows which effect is added or removed at each step along with the p-value. The summary table is produced by default if you do not specify the DETAILS= option. This option has no effect when you use the LASSO method.

DETAILS=STEPS

produces a table of selection details that displays fit statistics for the model at each step of the selection process and the approximate log p-value. The summary table that results from the DETAILS=SUMMARY option is also produced. This option has no effect when you use the LASSO method.

DETAILS=ALL

for methods other than LASSO, produces all the tables that are produced when DETAILS=STEPS and also produces a table that displays the effect that is added or removed at each step along with the p-value, chi-square statistic, and fit statistics for the model. For the LASSO method, it produces a table that displays the effects that are added or removed at each step; the LASSO regularization parameter; and the AIC, AICC, and BIC fit statistics.