Introduction to Statistical Modeling with SAS/STAT Software

Analysis of Variance

The identity

$\bY = \bX \tilde{\bbeta } + \left(\bY -\bX \tilde{\bbeta }\right)$

holds for all vectors $\tilde{\bbeta }$ , but only for the least squares solution is the residual $(\bY -\bX \widehat{\bbeta })$ orthogonal to the predicted value $\bX \widehat{\bbeta }$ . Because of this orthogonality, the additive identity holds not only for the vectors themselves, but also for their lengths (Pythagorean theorem):

$||\bY ||^2 = ||\bX \widehat{\bbeta }||^2 + ||(\bY -\bX \widehat{\bbeta })||^2$

Note that $\bX \widehat{\bbeta } = \bX \left(\bX ’\bX \right)^{-1}\bX ’\bY$ = $\bH \bY$ and note that $\bY - \bX \widehat{\bbeta } = (\bI - \bH )\bY = \bM \bY$ . The matrices $\mb{H}$ and $\bM = \bI -\bH$ play an important role in the theory of linear models and in statistical computations. Both are projection matrices—that is, they are symmetric and idempotent. (An idempotent matrix $\bA$ is a square matrix that satisfies $\bA \bA = \bA$ . The eigenvalues of an idempotent matrix take on the values 1 and 0 only.) The matrix $\mb{H}$ projects onto the subspace of $R^ n$ that is spanned by the columns of $\bX$ . The matrix $\bM$ projects onto the orthogonal complement of that space. Because of these properties you have $\mb{H}’=\mb{H}$ , $\mb{HH}=\mb{H}$ , $\mb{M}’=\mb{M}$ , $\mb{MM}=\mb{M}$ , $\mb{HM}=\mb{0}$ .

The Pythagorean relationship now can be written in terms of $\bH$ and $\bM$ as follows:

$||\bY ||^2 = \bY ’\bY = ||\mb{HY}||^2 + ||\mb{MY}||^2 = \bY ’\bH ’\bH \bY + \bY ’\bM ’\bM \bY = \bY ’\mb{HY} + \bY ’\mb{MY}$

If $\bX ’\bX$ is deficient in rank and a generalized inverse is used to solve the normal equations, then you work instead with the projection matrices $\bH =\bX \left(\bX ’\bX \right)^{-}\bX ’$ . Note that if $\bG$ is a generalized inverse of $\bX ’\bX$ , then $\mb{XGX}’$ , and hence also $\bH$ and $\bM$ , are invariant to the choice of $\bG$ .

The matrix $\bH$ is sometimes referred to as the "hat" matrix because when you premultiply the vector of observations with $\mb{H}$ , you produce the fitted values, which are commonly denoted by placing a "hat" over the $\bY$ vector, $\widehat{\bY } = \mb{HY}$ .

The term $\bY ’\bY$ is the uncorrected total sum of squares ( $\mr{SST}$ ) of the linear model, $\bY ’\mb{MY}$ is the error (residual) sum of squares ( $\mr{SSR}$ ), and $\bY ’\mb{HY}$ is the uncorrected model sum of squares. This leads to the analysis of variance table shown in Table 3.2.

Table 3.2: Analysis of Variance with Uncorrected Sums of Squares

Source	df	Sum of Squares
Model	$\mr{rank}(\bX )$	$\mr{SSM} = \bY ’\bH \bY = \widehat{\bbeta }’\bX ’\bY$
Residual	$n-\mr{rank}(\bX )$	$\mr{SSR} = \bY ’\bM \bY = \bY ’\bY - \widehat{\bbeta }\bX ’\bY =$
		$\sum _{i=1}^ n\left(Y_ i - \widehat{Y}_ i\right)^2$
Uncorr. Total	n	$\mr{SST} = \bY ’\bY = \sum _{i=1}^ n Y_ i^2$

When the model contains an intercept term, then the analysis of variance is usually corrected for the mean, as shown in Table 3.3.

Table 3.3: Analysis of Variance with Corrected Sums of Squares

Source	df	Sum of Squares
Model	$\mr{rank}(\bX )-1$	$\mr{SSM}_ c = \widehat{\bbeta }’\bX ’\bY - n\overline{Y}^2 = \sum _{i=1}^ n\left(\widehat{Y}_ i - \overline{Y}_ i\right)^2$
Residual	$n-\mr{rank}(\bX )$	$\mr{SSR} = \bY ’\bM \bY = \bY ’\bY - \widehat{\bbeta }\bX ’\bY =$
		$\sum _{i=1}^ n\left(Y_ i - \widehat{Y}_ i\right)^2$
Corrected Total	$n-1$	$\mr{SST}_ c = \bY ’\bY - n\overline{Y}^2 = \sum _{i=1}^ n\left(Y_ i - \overline{Y}\right)^2$

The coefficient of determination, also called the R-square statistic, measures the proportion of the total variation explained by the linear model. In models with intercept, it is defined as the ratio

$R^2 = 1 - \frac{ \mr{SSR} }{ \mr{SST}_ c } = 1 - \frac{ \sum _{i=1}^ n \left(Y_ i - \widehat{Y}_ i\right)^2 }{ \sum _{i=1}^ n \left(Y_ i - \overline{Y} \right)^2 }$

In models without intercept, the R-square statistic is a ratio of the uncorrected sums of squares

$R^2 = 1 - \frac{ \mr{SSR} }{ \mr{SST} } = 1 - \frac{ \sum _{i=1}^ n \left(Y_ i - \widehat{Y}_ i\right)^2 }{ \sum _{i=1}^ n Y_ i^2 }$