The GLMSELECT Procedure

Criteria Used in Model Selection Methods

PROC GLMSELECT supports a variety of fit statistics that you can specify as criteria for the CHOOSE=, SELECT=, and STOP= options in the MODEL statement. The following statistics are available:


adjusted R-square statistic (Darlington, 1968; Judge et al., 1985)


Akaike’s information criterion (Darlington, 1968; Judge et al., 1985)


corrected Akaike’s information criterion (Hurvich and Tsai, 1989)


Sawa Bayesian information criterion (Sawa, 1978; Judge et al., 1985)


Mallows’ $C_ p$ statistic (Mallows, 1973; Hocking, 1976)


predicted residual sum of squares statistic


Schwarz Bayesian information criterion (Schwarz, 1978; Judge et al., 1985)


significance level of the F statistic used to assess an effect’s contribution to the fit when it is added to or removed from a model


average square error over the validation data

Table 47.10 provides formulas and definitions for the fit statistics.

Table 47.10: Formulas and Definitions for Model Fit Summary Statistics


Definition or Formula


Number of observations


Number of parameters including the intercept

$\hat{\sigma }^2$

Estimate of pure error variance from fitting the full model


Total sum of squares corrected for the mean for the
dependent variable


Error sum of squares


$\rule[.25in]{0in}{0cm}\displaystyle \frac{\mbox{SSE}}{n}$


$\rule[.25in]{0in}{0cm}\displaystyle \frac{\mbox{SSE}}{n-p}$

$R^2 $

$\rule[.25in]{0in}{0cm}1 - \displaystyle \frac{\mbox{SSE}}{\mbox{SST}} $


$\rule[.25in]{0in}{0cm}1 - \displaystyle \frac{(n-1)(1-R^2)}{n-p}$

$\mbox{AIC} $

$\rule[.25in]{0in}{0cm}\displaystyle n \mbox{log} \left( \frac{\mbox{SSE}}{n} \right) + 2p +n + 2$

$\mbox{AICC} $

$\rule[.25in]{0in}{0cm}\displaystyle n \mbox{log} \left( \frac{\mbox{SSE}}{n} \right) + \frac{n(n+p)}{n-p-2}$

$\mbox{BIC} $

$\rule[.25in]{0in}{0cm}\displaystyle n \log \left( \frac{\mbox{SSE}}{n} \right) + 2(p+2)q - 2q^2 \mbox{ where } q = \frac{n \hat{\sigma }^2}{\mbox{SSE}}$

$\mbox{CP } (C_ p)$

$\rule[.25in]{0in}{0cm}\displaystyle \frac{\mbox{SSE}}{\hat{\sigma }^2} + 2p - n$


$\rule[.25in]{0in}{0cm}\displaystyle \sum _{i=1}^ n \frac{r_ i^2}{(1-h_ i)^2}\mbox{ where } $
$r_ i = \mbox{ residual at observation \Mathtext{i}}$ and
$h_ i = \mbox{ leverage of observation \Mathtext{i}} = \mb {x}_ i(\mb {X}’\mb {X})^{-}\mb {x}_ i’$


$\sqrt {\mbox{MSE}}$


$\displaystyle n \log \left( \frac{\mbox{SSE}}{n} \right) + p \log (n) $

Formulas for AIC and AICC

There is some inconsistency in the literature on the precise definitions for the AIC and AICC statistics. The definitions used in PROC GLMSELECT changed between the experimental and the production release of the procedure in SAS 9.2. The definitions now used in PROC GLMSELECT yield the same final models as before, but PROC GLMSELECT makes the connection between the AIC statistic and the AICC statistic more transparent.

In the context of linear regression, several different versions of the formulas for AIC and AICC appear in the statistics literature. However, for a fixed number of observations, these different versions differ by additive and positive multiplicative constants. Because the model selected to yield a minimum of a criterion is not affected if the criterion is changed by additive and positive multiplicative constants, these changes in the formula for AIC and AICC do not affect the selection process.

The following section provides details about these changes. Formulas used in the experimental download release are denoted with a superscript of $(d)$ and n, p and $\mbox{SSE}$ are defined in Table 47.10.

The experimental download release of PROC GLMSELECT used the following formulas for AIC (Darlington, 1968; Judge et al., 1985) and AICC (Hurvich, Simonoff, and Tsai, 1998):

\[  \mbox{AIC}^{(d)}= n \mbox{log} \left( \frac{\mbox{SSE}}{n} \right) + 2p  \]


\[  \mbox{AICC}^{(d)}= \mbox{log} \left( \frac{\mbox{SSE}}{n} \right) + 1 + \frac{2(p+1)}{n-p-2}  \]

PROC GLMSLECT now uses the definitions of AIC and AICC found in Hurvich and Tsai (1989):

\[  \mbox{AIC}= n \mbox{log} \left( \frac{\mbox{SSE}}{n} \right) + 2p +n + 2  \]


\[  \mbox{AICC}=\mbox{AIC} +\frac{ 2(p+1)(p+2)}{n-p-2}  \]

Hurvich and Tsai (1989) show that the formula for AICC can also be written as

\[  \mbox{AICC}=n \mbox{log} \left( \frac{\mbox{SSE}}{n} \right) + \frac{n(n+p)}{n-p-2}  \]

The relationships between the alternative forms of the formulas are

\[  \mbox{AIC}=\mbox{AIC}^{(d)}+n+2  \]
\[  \mbox{AICC}=n \mbox{ AICC}^{(d)}  \]