General Form of an Estimable Function

This section demonstrates a shorthand technique for displaying the generating set for any estimable $\mb {L}$. Suppose

\[  \mb {X} = \left[ \begin{array}{cccc} 1 &  1 &  0 &  0 \\ 1 &  1 &  0 &  0 \\ 1 &  0 &  1 &  0 \\ 1 &  0 &  1 &  0 \\ 1 &  0 &  0 &  1 \\ 1 &  0 &  0 &  1 \end{array} \right] ~ ~  \mbox{ and } ~ ~  \bbeta = \left[ \begin{array}{c} \mu \\ A_1 \\ A_2 \\ A_3 \end{array} \right]  \]

$\mb {X}$ is a generating set for $\mb {L}$, but so is the smaller set

\[  \mb {X^*} = \left[ \begin{array}{cccc} 1 &  1 &  0 &  0 \\ 1 &  0 &  1 &  0 \\ 1 &  0 &  0 &  1 \\ \end{array} \right]  \]

$\mb {X^*}$ is formed from $\mb {X}$ by deleting duplicate rows.

Since all estimable $\mb {L}$s must be linear functions of the rows of $\mb {X^*}$ for $\mb {L} \bbeta $ to be estimable, an $\mb {L}$ for a single-degree-of-freedom estimate can be represented symbolically as

\[  \mathit{L1} \times (1~ 1~ 0~ 0) + \mathit{L2} \times (1~ 0~ 1~ 0) + \mathit{L3} \times (1~ 0~ 0~ 1)  \]

or

\[  \mb {L} = (\mathit{L1}+\mathit{L2}+\mathit{L3},~ \mathit{L1},~ \mathit{L2},~ \mathit{L3}) ~   \]

For this example, $\mb {L} \bbeta $ is estimable if and only if the first element of $\mb {L}$ is equal to the sum of the other elements of $\mb {L}$ or if

\[  \mb {L} \bbeta = (\mathit{L1}+\mathit{L2}+\mathit{L3}) \times \mu + \mathit{L1} \times A_1 + \mathit{L2} \times A_2 + \mathit{L3} \times A_3  \]

is estimable for any values of L1, L2, and L3.

If other generating sets for $\mb {L}$ are represented symbolically, the symbolic notation looks different. However, the inherent nature of the rules is the same. For example, if row operations are performed on $\mb {X^*}$ to produce an identity matrix in the first $3 \times 3$ submatrix of the resulting matrix

\[  \mb {X^{**}} = \left[ \begin{array}{rrrr} 1 &  0 &  0 &  1 \\ 0 &  1 &  0 &  -1 \\ 0 &  0 &  1 &  -1 \end{array} \right]  \]

then $\mb {X^{**}}$ is also a generating set for $\mb {L}$. An estimable $\mb {L}$ generated from $\mb {X^{**}}$ can be represented symbolically as

\[  \mb {L} = (\mathit{L1},~ \mathit{L2},~ \mathit{L3},~ \mathit{L1}-\mathit{L2}-\mathit{L3}) ~   \]

Note that, again, the first element of $\mb {L}$ is equal to the sum of the other elements.

With multiple generating sets available, the question arises as to which one is the best to represent $\mb {L}$ symbolically. Clearly, a generating set containing a minimum of rows (of full row rank) and a maximum of zero elements is desirable.

The generalized $g_2$-inverse $(\mb {X’X})^{-}$ of $\mb {X’X}$ computed by the modified sweep operation (Goodnight, 1979) has the property that $(\mb {X’X})^{-}\mb {X’X}$ usually contains numerous zeros. For this reason, in PROC GLM the nonzero rows of $(\mb {X’X})^{-}\mb {X’X}$ are used to represent $\mb {L}$ symbolically.

If the generating set represented symbolically is of full row rank, the number of symbols $(\mathit{L1}, \mathit{L2}, \ldots )$ represents the maximum rank of any testable hypothesis (in other words, the maximum number of linearly independent rows for any $\mb {L}$ matrix that can be constructed). By letting each symbol in turn take on the value of 1 while the others are set to 0, the original generating set can be reconstructed.