Introduction to Statistical Modeling with SAS/STAT Software


Analysis of Variance

The identity

\[ \bY = \bX \tilde{\bbeta } + \left(\bY -\bX \tilde{\bbeta }\right) \]

holds for all vectors $\tilde{\bbeta }$, but only for the least squares solution is the residual $(\bY -\bX \widehat{\bbeta })$ orthogonal to the predicted value $\bX \widehat{\bbeta }$. Because of this orthogonality, the additive identity holds not only for the vectors themselves, but also for their lengths (Pythagorean theorem):

\[ ||\bY ||^2 = ||\bX \widehat{\bbeta }||^2 + ||(\bY -\bX \widehat{\bbeta })||^2 \]

Note that $\bX \widehat{\bbeta } = \bX \left(\bX ’\bX \right)^{-1}\bX ’\bY $ = $\bH \bY $ and note that $\bY - \bX \widehat{\bbeta } = (\bI - \bH )\bY = \bM \bY $. The matrices $\mb{H}$ and $\bM = \bI -\bH $ play an important role in the theory of linear models and in statistical computations. Both are projection matrices—that is, they are symmetric and idempotent. (An idempotent matrix $\bA $ is a square matrix that satisfies $\bA \bA = \bA $. The eigenvalues of an idempotent matrix take on the values 1 and 0 only.) The matrix $\mb{H}$ projects onto the subspace of $R^ n$ that is spanned by the columns of $\bX $. The matrix $\bM $ projects onto the orthogonal complement of that space. Because of these properties you have $\mb{H}’=\mb{H}$, $\mb{HH}=\mb{H}$, $\mb{M}’=\mb{M}$, $\mb{MM}=\mb{M}$, $\mb{HM}=\mb{0}$.

The Pythagorean relationship now can be written in terms of $\bH $ and $\bM $ as follows:

\[ ||\bY ||^2 = \bY ’\bY = ||\mb{HY}||^2 + ||\mb{MY}||^2 = \bY ’\bH ’\bH \bY + \bY ’\bM ’\bM \bY = \bY ’\mb{HY} + \bY ’\mb{MY} \]

If $\bX ’\bX $ is deficient in rank and a generalized inverse is used to solve the normal equations, then you work instead with the projection matrices $\bH =\bX \left(\bX ’\bX \right)^{-}\bX ’$. Note that if $\bG $ is a generalized inverse of $\bX ’\bX $, then $\mb{XGX}’$, and hence also $\bH $ and $\bM $, are invariant to the choice of $\bG $.

The matrix $\bH $ is sometimes referred to as the "hat" matrix because when you premultiply the vector of observations with $\mb{H}$, you produce the fitted values, which are commonly denoted by placing a "hat" over the $\bY $ vector, $\widehat{\bY } = \mb{HY}$.

The term $\bY ’\bY $ is the uncorrected total sum of squares ($\mr{SST}$) of the linear model, $\bY ’\mb{MY}$ is the error (residual) sum of squares ($\mr{SSR}$), and $\bY ’\mb{HY}$ is the uncorrected model sum of squares. This leads to the analysis of variance table shown in Table 3.2.

Table 3.2: Analysis of Variance with Uncorrected Sums of Squares

Source

df

Sum of Squares

Model

$\mr{rank}(\bX )$

$\mr{SSM} = \bY ’\bH \bY = \widehat{\bbeta }’\bX ’\bY $

Residual

$n-\mr{rank}(\bX )$

$\mr{SSR} = \bY ’\bM \bY = \bY ’\bY - \widehat{\bbeta }\bX ’\bY = $

   

$\sum _{i=1}^ n\left(Y_ i - \widehat{Y}_ i\right)^2$

Uncorr. Total

n

$\mr{SST} = \bY ’\bY = \sum _{i=1}^ n Y_ i^2$


When the model contains an intercept term, then the analysis of variance is usually corrected for the mean, as shown in Table 3.3.

Table 3.3: Analysis of Variance with Corrected Sums of Squares

Source

df

Sum of Squares

Model

$\mr{rank}(\bX )-1$

$\mr{SSM}_ c = \widehat{\bbeta }’\bX ’\bY - n\overline{Y}^2 = \sum _{i=1}^ n\left(\widehat{Y}_ i - \overline{Y}_ i\right)^2$

Residual

$n-\mr{rank}(\bX )$

$\mr{SSR} = \bY ’\bM \bY = \bY ’\bY - \widehat{\bbeta }\bX ’\bY = $

   

$\sum _{i=1}^ n\left(Y_ i - \widehat{Y}_ i\right)^2$

Corrected Total

$n-1$

$\mr{SST}_ c = \bY ’\bY - n\overline{Y}^2 = \sum _{i=1}^ n\left(Y_ i - \overline{Y}\right)^2$


The coefficient of determination, also called the R-square statistic, measures the proportion of the total variation explained by the linear model. In models with intercept, it is defined as the ratio

\[ R^2 = 1 - \frac{ \mr{SSR} }{ \mr{SST}_ c } = 1 - \frac{ \sum _{i=1}^ n \left(Y_ i - \widehat{Y}_ i\right)^2 }{ \sum _{i=1}^ n \left(Y_ i - \overline{Y} \right)^2 } \]

In models without intercept, the R-square statistic is a ratio of the uncorrected sums of squares

\[ R^2 = 1 - \frac{ \mr{SSR} }{ \mr{SST} } = 1 - \frac{ \sum _{i=1}^ n \left(Y_ i - \widehat{Y}_ i\right)^2 }{ \sum _{i=1}^ n Y_ i^2 } \]