The REG Procedure

Model Fit and Diagnostic Statistics

This section gathers the formulas for the statistics available in the MODEL, PLOT, and OUTPUT statements. The model to be fit is $\mb {Y}=\mb {X}\bbeta + \bepsilon $, and the parameter estimate is denoted by $\mb {b}=(\mb {X’}\mb {X)^{-}}\mb {X’}\mb {Y}$. The subscript i denotes values for the ith observation, the parenthetical subscript $(i)$ means that the statistic is computed by using all observations except the ith observation, and the subscript jj indicates the jth diagonal matrix entry. The ALPHA= option in the PROC REG or MODEL statement is used to set the $\alpha $ value for the t statistics.

Table 83.7 contains the summary statistics for assessing the fit of the model.

Table 83.7: Formulas and Definitions for Model Fit Summary Statistics

Model Option or Statistic

Definition or Formula

n

the number of observations

p

the number of parameters including the intercept

i

1 if there is an intercept, 0 otherwise

$\hat{\sigma }^2$

the estimate of pure error variance from the SIGMA=
option or from fitting the full model

$\mbox{SST}_0$

the uncorrected total sum of squares for the dependent
variable

$\mbox{SST}_1 $

the total sum of squares corrected for the mean for the
dependent variable

$\mbox{SSE}$

the error sum of squares

$\mbox{MSE}$

$\rule[.25in]{0in}{0cm}\displaystyle \frac{\mbox{SSE}}{n-p}$

$R^2 $

$\rule[.25in]{0in}{0cm}1 - \displaystyle \frac{\mbox{SSE}}{\mbox{SST}_ i} $

$\mbox{ADJRSQ}$

$\rule[.25in]{0in}{0cm}1 - \displaystyle \frac{(n-i)(1-R^2)}{n-p}$

$\mbox{AIC} $

$\rule[.25in]{0in}{0cm}\displaystyle n \ln \left( \frac{\mbox{SSE}}{n} \right) + 2p$

$\mbox{BIC} $

$\rule[.25in]{0in}{0cm}\displaystyle n \ln \left( \frac{\mbox{SSE}}{n} \right) + 2(p+2)q - 2q^2 \mbox{ where } q = \frac{n \hat{\sigma }^2}{\mbox{SSE}}$

$\mbox{CP } (C_ p)$

$\rule[.25in]{0in}{0cm}\displaystyle \frac{\mbox{SSE}}{\hat{\sigma }^2} + 2p - n$

$\mbox{GMSEP}$

$\displaystyle \frac{\mbox{MSE}(n+1)(n-2)}{n(n-p-1)} = \frac{1}{n} S_ p(n+1)(n-2)$

$\mbox{JP } (J_ p)$

$\displaystyle \frac{n+p}{n} \mbox{MSE} $

$\mbox{PC} $

$\displaystyle \frac{n+p}{n-p} (1 - R^2) = J_ p \left( \frac{n}{\mbox{SST}_ i} \right)$

$\mbox{PRESS}$

the sum of squares of $\mbox{predr}_ i$ (see Table 83.8)

$\mbox{RMSE}$

$\sqrt {\mbox{MSE}}$

$\mbox{SBC}$

$\displaystyle n \ln \left( \frac{\mbox{SSE}}{n} \right) + p \ln (n) $

$\mbox{SP } (S_ p)$

$\displaystyle \frac{\mbox{MSE}}{n-p-1} $


Table 83.8 contains the diagnostic statistics and their formulas; these formulas and further information can be found in Chapter 4: Introduction to Regression Procedures, and in the section Influence Statistics. Each statistic is computed for each observation.

Table 83.8: Formulas and Definitions for Diagnostic Statistics

MODEL Option or Statistic

Formula

PRED ($\displaystyle \widehat{\mb {Y}}_ i$)

$\mb {X}_ i\mb {b}$

RES ($r_ i$)

$\mb {Y}_ i - \widehat{\mb {Y}}_ i$

H ($h_ i$)

$\mb {x}_ i(\mb {X’}\mb {X})^{-}\mb {x}_ i’$

STDP

$\sqrt {h_ i\widehat{\sigma }^2}$

STDI

$\sqrt {(1+h_ i)\widehat{\sigma }^2}$

STDR

$\sqrt {(1-h_ i)\widehat{\sigma }^2}$

LCL

$\displaystyle \widehat{Y}_ i-t_{\frac{\alpha }{2}}$STDI

LCLM

$\displaystyle \widehat{Y}_ i-t_{\frac{\alpha }{2}}$STDP

UCL

$\displaystyle \widehat{Y}_ i+t_{\frac{\alpha }{2}}$STDI

UCLM

$\displaystyle \widehat{Y}_ i+t_{\frac{\alpha }{2}}$STDP

STUDENT

$\displaystyle \frac{r_ i}{\mbox{STDR}_ i}$

RSTUDENT

$\displaystyle \frac{r_ i}{{\hat{\sigma }}_{(i)}\sqrt {1-h_ i}}$

COOKD

$\displaystyle \frac{1}{p}\mbox{STUDENT}^2\frac{\mbox{STDP}^2}{\mbox{STDR}^2}$

COVRATIO

$\displaystyle \frac{\mbox{det}({\hat{\sigma }^2}_{(i)}(\mb {x}_{(i)}\mb {x}_{(i)})^{-1}}{\mbox{det}({\hat{\sigma }^2}(\mb {X}\mb {X})^{-1})}$

DFFITS

$\displaystyle \frac{(\widehat{\mb {Y}}_ i-\widehat{\mb {Y}}_{(i)})}{({\hat{\sigma }}_{(i)}\sqrt {h_ i})}$

DFBETAS$_ j$

$\displaystyle \frac{\mb {b}_ j-\mb {b}_{(i)j}}{{\hat{\sigma }}_{(i)}\sqrt {(\mb {X}’\mb {X})_{jj}}}$

PRESS($\mbox{predr}_ i$)

$\displaystyle \frac{r_ i}{1-h_ i}$