The MIANALYZE Procedure

Multivariate Inferences

Multivariate inference based on Wald tests can be done with m imputed data sets. The approach is a generalization of the approach taken in the univariate case (Rubin 1987, p. 137; Schafer 1997, p. 113). Suppose that $\hat{\mb{Q}_ i}$ and $\hat{\mb{W}_ i}$ are the point and covariance matrix estimates for a p-dimensional parameter $\mb{Q}$ (such as a multivariate mean) from the $i\mr{th}$ imputed data set, i = 1, 2, …, m. Then the combined point estimate for $\mb{Q}$ from the multiple imputation is the average of the m complete-data estimates:

\[ \overline{\mb{Q}} = \frac{1}{m} \sum _{i=1}^{m} \hat{\mb{Q}_ i} \]

Suppose that $\overline{\mb{W}}$ is the within-imputation covariance matrix, which is the average of the m complete-data estimates:

\[ \overline{\mb{W}} = \frac{1}{m} \sum _{i=1}^{m} \hat{\mb{W}_ i} \]

And suppose that $\mb{B}$ is the between-imputation covariance matrix:

\[ \mb{B} = \frac{1}{m-1} \sum _{i=1}^{m} (\hat{\mb{Q}_ i}-\overline{\mb{Q}}) (\hat{\mb{Q}_ i}-\overline{\mb{Q}})’ \]

Then the covariance matrix associated with $\overline{\mb{Q}}$ is the total covariance matrix

\[ \mb{T}_{0} = \overline{\mb{W}} + (1+\frac{1}{m})\mb{B} \]

The natural multivariate extension of the t statistic used in the univariate case is the F statistic

\[ F_{0} = (\mb{Q}-\overline{\mb{Q}})’ \mb{T}_{0}^{-1} (\mb{Q}-\overline{\mb{Q}}) \]

with degrees of freedom p and

\[ v=(m-1)(1+1/r)^{2} \]

where

\[ r = (1+\frac{1}{m}) \, \mr{trace} (\mb{B} \overline{\mb{W}}^{-1}) / p \]

is an average relative increase in variance due to nonresponse (Rubin 1987, p. 137; Schafer 1997, p. 114).

However, the reference distribution of the statistic $F_{0}$ is not easily derived. Especially for small m, the between-imputation covariance matrix $\mb{B}$ is unstable and does not have full rank for $m \le p$ (Schafer 1997, p. 113).

One solution is to make an additional assumption that the population between-imputation and within-imputation covariance matrices are proportional to each other (Schafer 1997, p. 113). This assumption implies that the fractions of missing information for all components of $\mb{Q}$ are equal. Under this assumption, a more stable estimate of the total covariance matrix is

\[ \mb{T} = (1+r) \overline{\mb{W}} \]

With the total covariance matrix $\mb{T}$, the F statistic (Rubin 1987, p. 137)

\[ F = (\mb{Q}-\overline{\mb{Q}})’ \mb{T}^{-1} (\mb{Q}-\overline{\mb{Q}}) / p \]

has an F distribution with degrees of freedom p and $v_{1}$, where

\[ v_{1} = \frac{1}{2} (p+1) (m-1) (1+\frac{1}{r})^{2} \]

For $t=p(m-1) \leq 4$, PROC MIANALYZE uses the degrees of freedom $v_{1}$ in the analysis. For $t=p(m-1) > 4$, PROC MIANALYZE uses $v_{2}$, a better approximation of the degrees of freedom given by Li, Raghunathan, and Rubin (1991):

\[ v_{2} = 4 + (t-4) \left[ 1+ \frac{1}{r} (1-\frac{2}{t}) \right]^{2} \]