General Form of an Estimable Function :: SAS/STAT(R) 12.1 User's Guide

General Form of an Estimable Function

This section demonstrates a shorthand technique for displaying the generating set for any estimable $\mb {L}$ . Suppose

$\mb {X} = \left[ \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \end{array} \right] ~ ~ \mbox{ and } ~ ~ \bbeta = \left[ \begin{array}{c} \mu \\ A_1 \\ A_2 \\ A_3 \end{array} \right]$

$\mb {X}$ is a generating set for $\mb {L}$ , but so is the smaller set

$\mb {X^*} = \left[ \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ \end{array} \right]$

$\mb {X^*}$ is formed from $\mb {X}$ by deleting duplicate rows.

Since all estimable $\mb {L}$ s must be linear functions of the rows of $\mb {X^*}$ for $\mb {L} \bbeta$ to be estimable, an $\mb {L}$ for a single-degree-of-freedom estimate can be represented symbolically as

$\mathit{L1} \times (1~ 1~ 0~ 0) + \mathit{L2} \times (1~ 0~ 1~ 0) + \mathit{L3} \times (1~ 0~ 0~ 1)$

$\mb {L} = (\mathit{L1}+\mathit{L2}+\mathit{L3},~ \mathit{L1},~ \mathit{L2},~ \mathit{L3}) ~$

For this example, $\mb {L} \bbeta$ is estimable if and only if the first element of $\mb {L}$ is equal to the sum of the other elements of $\mb {L}$ or if

$\mb {L} \bbeta = (\mathit{L1}+\mathit{L2}+\mathit{L3}) \times \mu + \mathit{L1} \times A_1 + \mathit{L2} \times A_2 + \mathit{L3} \times A_3$

is estimable for any values of L1, L2, and L3.

If other generating sets for $\mb {L}$ are represented symbolically, the symbolic notation looks different. However, the inherent nature of the rules is the same. For example, if row operations are performed on $\mb {X^*}$ to produce an identity matrix in the first $3 \times 3$ submatrix of the resulting matrix

$\mb {X^{**}} = \left[ \begin{array}{rrrr} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & -1 \end{array} \right]$

then $\mb {X^{**}}$ is also a generating set for $\mb {L}$ . An estimable $\mb {L}$ generated from $\mb {X^{**}}$ can be represented symbolically as

$\mb {L} = (\mathit{L1},~ \mathit{L2},~ \mathit{L3},~ \mathit{L1}-\mathit{L2}-\mathit{L3}) ~$

Note that, again, the first element of $\mb {L}$ is equal to the sum of the other elements.

With multiple generating sets available, the question arises as to which one is the best to represent $\mb {L}$ symbolically. Clearly, a generating set containing a minimum of rows (of full row rank) and a maximum of zero elements is desirable.

The generalized -inverse $(\mb {X’X})^{-}$ of $\mb {X’X}$ computed by the modified sweep operation (Goodnight, 1979) has the property that $(\mb {X’X})^{-}\mb {X’X}$ usually contains numerous zeros. For this reason, in PROC GLM the nonzero rows of $(\mb {X’X})^{-}\mb {X’X}$ are used to represent $\mb {L}$ symbolically.

If the generating set represented symbolically is of full row rank, the number of symbols $(\mathit{L1}, \mathit{L2}, \ldots )$ represents the maximum rank of any testable hypothesis (in other words, the maximum number of linearly independent rows for any $\mb {L}$ matrix that can be constructed). By letting each symbol in turn take on the value of 1 while the others are set to 0, the original generating set can be reconstructed.