The GLM Procedure

Multivariate Analysis of Variance

If you fit several dependent variables to the same effects, you might want to make joint tests involving parameters of several dependent variables. Suppose you have p dependent variables, k parameters for each dependent variable, and n observations. The models can be collected into one equation:

$\mb {Y} = \mb {X} \bbeta + \bepsilon$

where $\mb {Y}$ is $n \times p$ , $\mb {X}$ is $n \times k$ , $\bbeta$ is $k \times p$ , and $\bepsilon$ is $n \times p$ . Each of the p models can be estimated and tested separately. However, you might also want to consider the joint distribution and test the p models simultaneously.

For multivariate tests, you need to make some assumptions about the errors. With p dependent variables, there are $n \times p$ errors that are independent across observations but not across dependent variables. Assume

$\mbox{vec}(\bepsilon ) \sim N(\mb {0},\mb {I}_ n \otimes \bSigma )$

where vec $(\bepsilon )$ strings $\bepsilon$ out by rows, $\otimes$ denotes Kronecker product multiplication, and $\bSigma$ is $p \times p$ . $\bSigma$ can be estimated by

$\mb {S} = \frac{\mb {e}\mb {e}}{n - r} = \frac{(\mb {Y} - \mb {Xb})(\mb {Y} - \mb {Xb})}{n - r}$

where $\mb {b}=(\mb {X’X})^{-}\mb {X’Y}$ , r is the rank of the $\mb {X}$ matrix, and $\mb {e}$ is the matrix of residuals.

If $\mb {S}$ is scaled to unit diagonals, the values in $\mb {S}$ are called partial correlations of the Ys adjusting for the Xs. This matrix can be displayed by PROC GLM if PRINTE is specified as a MANOVA option.

The multivariate general linear hypothesis is written

$\mb {L}\bbeta \mb {M} = 0$

You can form hypotheses for linear combinations across columns, as well as across rows of $\bbeta$ .

The MANOVA statement of the GLM procedure tests special cases where $\mb {L}$ corresponds to Type I, Type II, Type III, or Type IV tests, and $\mb {M}$ is the $p \times p$ identity matrix. These tests are joint tests that the given type of hypothesis holds for all dependent variables in the model, and they are often sufficient to test all hypotheses of interest.

Finally, when these special cases are not appropriate, you can specify your own $\mb {L}$ and $\mb {M}$ matrices by using the CONTRAST statement before the MANOVA statement and the M= specification in the MANOVA statement, respectively. Another alternative is to use a REPEATED statement, which automatically generates a variety of $\mb {M}$ matrices useful in repeated measures analysis of variance. See the section REPEATED Statement and the section Repeated Measures Analysis of Variance for more information.

One useful way to think of a MANOVA analysis with an $\mb {M}$ matrix other than the identity is as an analysis of a set of transformed variables defined by the columns of the $\mb {M}$ matrix. You should note, however, that PROC GLM always displays the $\mb {M}$ matrix in such a way that the transformed variables are defined by the rows, not the columns, of the displayed $\mb {M}$ matrix.

All multivariate tests carried out by the GLM procedure first construct the matrices $\mb {H}$ and $\mb {E}$ corresponding to the numerator and denominator, respectively, of a univariate F test:

$\displaystyle \mb {H}$	$\displaystyle =$	$\displaystyle \mb {M}’(\mb {Lb})’ (\mb {L}(\mb {X’X})^{-}\mb {L}’)^{-1} (\mb {Lb})\mb {M}$
$\displaystyle \mb {E}$	$\displaystyle =$	$\displaystyle \mb {M}’(\mb {Y}’\mb {Y} - \mb {b}’(\mb {X’X})\mb {b})\mb {M}$

The diagonal elements of $\mb {H}$ and $\mb {E}$ correspond to the hypothesis and error SS for univariate tests. When the $\mb {M}$ matrix is the identity matrix (the default), these tests are for the original dependent variables on the left side of the MODEL statement. When an $\mb {M}$ matrix other than the identity is specified, the tests are for transformed variables defined by the columns of the $\mb {M}$ matrix. These tests can be studied by requesting the SUMMARY option, which produces univariate analyses for each original or transformed variable.

Four multivariate test statistics, all functions of the eigenvalues of $\mb {E}^{-1}\mb {H}$ (or $(\mb {E}+\mb {H})^{-1}\mb {H}$ ), are constructed:

Wilks’ lambda = det $(\mb {E})$ /det $(\mb {H}+\mb {E})$
Pillai’s trace = trace $(\mb {H}(\mb {H} + \mb {E})^{-1})$
Hotelling-Lawley trace = trace $(\mb {E}^{-1}\mb {H})$
Roy’s greatest root = $\lambda$ , largest eigenvalue of $\mb {E}^{-1}\mb {H}$

By default, all four are reported with p-values based on F approximations, as discussed in the “Multivariate Tests” section in Chapter 4: Introduction to Regression Procedures. Alternatively, if you specify MSTAT=EXACT in the associated MANOVA or REPEATED statement, p-values for three of the four tests are computed exactly (Wilks’ lambda, the Hotelling-Lawley trace, and Roy’s greatest root), and the p-values for the fourth (Pillai’s trace) are based on an F approximation that is more accurate than the default. See the “Multivariate Tests” section in Chapter 4: Introduction to Regression Procedures, for more details on the exact calculations.