The Four Types of Estimable Functions


General Form of an Estimable Function

This section demonstrates a shorthand technique for displaying the generating set for any estimable $\mb{L}$. Suppose

\[ \mb{X} = \left[ \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \end{array} \right] ~ ~ \mbox{ and } ~ ~ \bbeta = \left[ \begin{array}{c} \mu \\ A_1 \\ A_2 \\ A_3 \end{array} \right] \]

$\mb{X}$ is a generating set for $\mb{L}$, but so is the smaller set

\[ \mb{X^*} = \left[ \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ \end{array} \right] \]

$\mb{X^*}$ is formed from $\mb{X}$ by deleting duplicate rows.

Since all estimable $\mb{L}$s must be linear functions of the rows of $\mb{X^*}$ for $\mb{L} \bbeta $ to be estimable, an $\mb{L}$ for a single-degree-of-freedom estimate can be represented symbolically as

\[ \mathit{L1} \times (1~ 1~ 0~ 0) + \mathit{L2} \times (1~ 0~ 1~ 0) + \mathit{L3} \times (1~ 0~ 0~ 1) \]

or

\[ \mb{L} = (\mathit{L1}+\mathit{L2}+\mathit{L3},~ \mathit{L1},~ \mathit{L2},~ \mathit{L3}) ~ \]

For this example, $\mb{L} \bbeta $ is estimable if and only if the first element of $\mb{L}$ is equal to the sum of the other elements of $\mb{L}$ or if

\[ \mb{L} \bbeta = (\mathit{L1}+\mathit{L2}+\mathit{L3}) \times \mu + \mathit{L1} \times A_1 + \mathit{L2} \times A_2 + \mathit{L3} \times A_3 \]

is estimable for any values of L1, L2, and L3.

If other generating sets for $\mb{L}$ are represented symbolically, the symbolic notation looks different. However, the inherent nature of the rules is the same. For example, if row operations are performed on $\mb{X^*}$ to produce an identity matrix in the first $3 \times 3$ submatrix of the resulting matrix

\[ \mb{X^{**}} = \left[ \begin{array}{rrrr} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & -1 \end{array} \right] \]

then $\mb{X^{**}}$ is also a generating set for $\mb{L}$. An estimable $\mb{L}$ generated from $\mb{X^{**}}$ can be represented symbolically as

\[ \mb{L} = (\mathit{L1},~ \mathit{L2},~ \mathit{L3},~ \mathit{L1}-\mathit{L2}-\mathit{L3}) ~ \]

Note that, again, the first element of $\mb{L}$ is equal to the sum of the other elements.

With multiple generating sets available, the question arises as to which one is the best to represent $\mb{L}$ symbolically. Clearly, a generating set containing a minimum of rows (of full row rank) and a maximum of zero elements is desirable.

The generalized $g_2$-inverse $(\mb{X'X})^{-}$ of $\mb{X'X}$ computed by the modified sweep operation (Goodnight 1979) has the property that $(\mb{X'X})^{-}\mb{X'X}$ usually contains numerous zeros. For this reason, in PROC GLM the nonzero rows of $(\mb{X'X})^{-}\mb{X'X}$ are used to represent $\mb{L}$ symbolically.

If the generating set represented symbolically is of full row rank, the number of symbols $(\mathit{L1}, \mathit{L2}, \ldots )$ represents the maximum rank of any testable hypothesis (in other words, the maximum number of linearly independent rows for any $\mb{L}$ matrix that can be constructed). By letting each symbol in turn take on the value of 1 while the others are set to 0, the original generating set can be reconstructed.