The HPGENSELECT Procedure

SELECTION Statement

  • SELECTION <options>;

The SELECTION statement performs model selection by examining whether effects should be added to or removed from the model according to rules that are defined by model selection methods. The statement is fully documented in the section SELECTION Statement in Chapter 4: Shared Statistical Concepts.

The HPGENSELECT procedure supports the following effect-selection methods in the SELECTION statement:

METHOD=NONE

results in no model selection. This method fits the full model.

METHOD=FORWARD

performs forward selection. This method starts with no effects in the model and adds effects.

METHOD=BACKWARD

performs backward elimination. This method starts with all effects in the model and deletes effects.

METHOD=STEPWISE

performs stepwise regression. This method is similar to the FORWARD method except that effects already in the model do not necessarily stay there.

The only effect-selection criterion supported by the HPGENSELECT procedure is SELECT= SL, where effects enter and leave the model based on an evaluation of the significance level. To determine this level of significance for each candidate effect, the HPGENSELECT procedure calculates an approximate chi-square test statistic.

The following criteria are available for the CHOOSE= option in the SELECT statement:

AIC

Akaike’s information criterion (Akaike, 1974)

AICC

a small-sample bias corrected version of Akaike’s information criterion as promoted in Hurvich and Tsai (1989) and Burnham and Anderson (1998) among others

BIC | SBC

Schwarz Bayesian criterion (Schwarz, 1978)

The following criteria are available for the STOP= option in the SELECT statement:

SL

the significance level of the test

AIC

Akaike’s information criterion (Akaike, 1974)

AICC

a small-sample bias corrected version of Akaike’s information criterion as promoted in Hurvich and Tsai (1989) and Burnham and Anderson (1998) among others

BIC | SBC

Schwarz Bayesian criterion (Schwarz, 1978)

The calculation of the information criteria uses the following formulas, where p denotes the number of effective parameters in the candidate model, f denotes the number of frequencies used, and l is the log likelihood evaluated at the converged estimates:

\begin{align*}  \mr{AIC} =&  -2 l + 2p \\ \mr{AICC} =&  \left\{ \begin{array}{ll} -2 l + 2 p f/(f-p-1) &  \text {when} f > p+2 \cr -2 l + 2 p (p+2) &  \text {otherwise} \end{array}\right. \\ \mr{BIC} =&  -2 l + p \log (f) \end{align*}

If the PARTITION statement is specified, then the AIC, AICC, BIC, and SL statistics are computed on the training data set; otherwise they are computed on the full data set.

When you specify one of the following DETAILS= options in the SELECTION statement, the HPGENSELECT procedure produces the indicated tables:

DETAILS=SUMMARY

produces a summary table that shows which effect is added or removed at each step along with the p-value. The summary table is produced by default if the DETAILS= option is not specified.

DETAILS=STEPS

produces a table of selection details that displays fit statistics for the model at each step of the selection process and the approximate log p-value. The summary table that results from the DETAILS=SUMMARY option is also produced.

DETAILS=ALL

produces all the tables that are produced when DETAILS=STEPS and also produces a table that displays the effect that is added or removed at each step along with the p-value, chi-square statistic, and fit statistics for the model.