The Four Types of Estimable Functions


Type I SS and Estimable Functions

In PROC GLM, the Type I SS and the associated hypotheses they test are byproducts of the modified sweep operator used to compute a generalized $g_2$-inverse of $\mb{X'X}$ and a solution to the normal equations. For the model $\mr{E}[Y] = x_1\beta _1 + x_2\beta _2 + x_3\beta _3$, the Type I SS for each effect are as follows:

Effect

 

Type I SS

$x_1$

 

$R(\beta _1)$

$x_2$

 

$R(\beta _2~ |~ \beta _1)$

$x_3$

 

$R(\beta _3~ |~ \beta _1, \beta _2)$

Note that some other SAS/STAT procedures compute Type I hypotheses by sweeping $\mb{X'X}$ (for example, PROC MIXED and PROC GLIMMIX), but their test statistics are not necessarily equivalent to the results of using those procedures to fit models that contain successively more effects.

The Type I SS are model-order dependent; each effect is adjusted only for the preceding effects in the model.

There are numerous ways to obtain a Type I hypothesis matrix $\mb{L}$ for each effect. One way is to form the $\mb{X'X}$ matrix and then reduce $\mb{X'X}$ to an upper triangular matrix by row operations, skipping over any rows with a zero diagonal. The nonzero rows of the resulting matrix associated with $x_1$ provide an $\mb{L}$ such that

\[  \mbox{SS}(H_0\colon ~  \mb{L}\bbeta = \mb{0}) = R(\beta _1)  \]

The nonzero rows of the resulting matrix associated with $x_2$ provide an $\mb{L}$ such that

\[  \mbox{SS}(H_0\colon ~  \mb{L}\bbeta = \mb{0}) = R(\beta _2~ |~ \beta _1)  \]

The last set of nonzero rows (associated with $x_3$) provide an $\mb{L}$ such that

\[  \mbox{SS}(H_0\colon ~  \mb{L}\bbeta = \mb{0}) = R(\beta _3~ |~ \beta _1, \beta _2)  \]

Another more formalized representation of Type I generating sets for $x_1$, $x_2$, and $x_3$, respectively, is

\[  \begin{array}{lcccccccc} \Strong{G}_1 &  = &  ( &  \Strong{X}_1’\Strong{X}_1 &  | &  \Strong{X}_1’\Strong{X}_2 &  | &  \Strong{X}_1’\Strong{X}_3 &  ) \\[0.05in] \Strong{G}_2 &  = &  ( &  0 &  | &  \Strong{X}_2’\Strong{M}_1\Strong{X}_2 &  | &  \Strong{X}_2’\Strong{M}_1\Strong{X}_3 &  ) \\[0.05in] \Strong{G}_3 &  = &  ( &  0 &  | &  0 &  | &  \Strong{X}_3’\Strong{M}_2\Strong{X}_3 &  ) \\[0.05in]\end{array}  \]

where

\[  \Strong{M}_1 = \Strong{I} - \Strong{X}_1(\Strong{X}_1’\Strong{X}_1)^-\Strong{X}_1’  \]

and

\[  \Strong{M}_2 = \Strong{M}_1 - \Strong{M}_1\Strong{X}_2(\Strong{X}_2’\Strong{M}_1\Strong{X}_2)^-\Strong{X}_2’\Strong{M}_1  \]

Using the Type I generating set $\mb{G}_2$ (for example), if an $\mb{L}$ is formed from linear combinations of the rows of $\mb{G}_2$ such that $\mb{L}$ is of full row rank and of the same row rank as $\mb{G}_2$, then SS$(H_0\colon ~  \mb{L}\bbeta =\mb{0})=R(\beta _2~ |~ \beta _1)$.

In the GLM procedure, the Type I estimable functions displayed symbolically when the E1 option is requested are

\begin{eqnarray*}  \Strong{G}_1^* &  = &  (\Strong{X}_1’ \Strong{X}_1)^-\Strong{G}_1 \\[0.05in] \Strong{G}_2^* &  = &  (\Strong{X}_2’\Strong{M}_1\Strong{X}_2)^-\Strong{G}_2 \\[0.05in] \Strong{G}_3^* &  = &  (\Strong{X}_3’\Strong{M}_2\Strong{X}_3)^-\Strong{G}_3 \end{eqnarray*}

As can be seen from the nature of the generating sets $\mb{G}_1$, $\mb{G}_2$, and $\mb{G}_3$, only the Type I estimable functions for $\beta _3$ are guaranteed not to involve the $\beta _1$ and $\beta _2$ parameters. The Type I hypothesis for $\beta _2$ can (and often does) involve $\beta _3$ parameters, and likewise the Type I hypothesis for $\beta _1$ often involves $\beta _2$ and $\beta _3$ parameters.

There are, however, a number of models for which the Type I hypotheses are considered appropriate. These are as follows:

  • balanced ANOVA models specified in proper sequence (that is, interactions do not precede main effects in the MODEL statement and so forth)

  • purely nested models (specified in the proper sequence)

  • polynomial regression models (in the proper sequence)