The MIANALYZE Procedure

Multivariate Inferences

Multivariate inference based on Wald tests can be done with m imputed data sets. The approach is a generalization of the approach taken in the univariate case (Rubin 1987, p. 137; Schafer 1997, p. 113). Suppose that $\hat{\mb{Q}_ i}$ and $\hat{\mb{W}_ i}$ are the point and covariance matrix estimates for a p-dimensional parameter $\mb{Q}$ (such as a multivariate mean) from the $i\mr{th}$ imputed data set, i = 1, 2, …, m. Then the combined point estimate for $\mb{Q}$ from the multiple imputation is the average of the m complete-data estimates:

$\overline{\mb{Q}} = \frac{1}{m} \sum _{i=1}^{m} \hat{\mb{Q}_ i}$

Suppose that $\overline{\mb{W}}$ is the within-imputation covariance matrix, which is the average of the m complete-data estimates:

$\overline{\mb{W}} = \frac{1}{m} \sum _{i=1}^{m} \hat{\mb{W}_ i}$

And suppose that $\mb{B}$ is the between-imputation covariance matrix:

$\mb{B} = \frac{1}{m-1} \sum _{i=1}^{m} (\hat{\mb{Q}_ i}-\overline{\mb{Q}}) (\hat{\mb{Q}_ i}-\overline{\mb{Q}})’$

Then the covariance matrix associated with $\overline{\mb{Q}}$ is the total covariance matrix

$\mb{T}_{0} = \overline{\mb{W}} + (1+\frac{1}{m})\mb{B}$

The natural multivariate extension of the t statistic used in the univariate case is the F statistic

$F_{0} = (\mb{Q}-\overline{\mb{Q}})’ \mb{T}_{0}^{-1} (\mb{Q}-\overline{\mb{Q}})$

with degrees of freedom p and

$v=(m-1)(1+1/r)^{2}$

where

$r = (1+\frac{1}{m}) \, \mr{trace} (\mb{B} \overline{\mb{W}}^{-1}) / p$

is an average relative increase in variance due to nonresponse (Rubin 1987, p. 137; Schafer 1997, p. 114).

However, the reference distribution of the statistic $F_{0}$ is not easily derived. Especially for small m, the between-imputation covariance matrix $\mb{B}$ is unstable and does not have full rank for $m \le p$ (Schafer 1997, p. 113).

One solution is to make an additional assumption that the population between-imputation and within-imputation covariance matrices are proportional to each other (Schafer 1997, p. 113). This assumption implies that the fractions of missing information for all components of $\mb{Q}$ are equal. Under this assumption, a more stable estimate of the total covariance matrix is

$\mb{T} = (1+r) \overline{\mb{W}}$

With the total covariance matrix $\mb{T}$ , the F statistic (Rubin 1987, p. 137)

$F = (\mb{Q}-\overline{\mb{Q}})’ \mb{T}^{-1} (\mb{Q}-\overline{\mb{Q}}) / p$

has an F distribution with degrees of freedom p and $v_{1}$ , where

$v_{1} = \frac{1}{2} (p+1) (m-1) (1+\frac{1}{r})^{2}$

For $t=p(m-1) \leq 4$ , PROC MIANALYZE uses the degrees of freedom $v_{1}$ in the analysis. For $t=p(m-1) > 4$ , PROC MIANALYZE uses $v_{2}$ , a better approximation of the degrees of freedom given by Li, Raghunathan, and Rubin (1991):

$v_{2} = 4 + (t-4) \left[ 1+ \frac{1}{r} (1-\frac{2}{t}) \right]^{2}$