The MIANALYZE Procedure

Multivariate Inferences

Multivariate inference based on Wald tests can be done with m imputed data sets. The approach is a generalization of the approach taken in the univariate case (Rubin 1987, p. 137; Schafer 1997, p. 113). Suppose that $\hat{\mb {Q}_ i}$ and $\hat{\mb {W}_ i}$ are the point and covariance matrix estimates for a p-dimensional parameter $\mb {Q}$ (such as a multivariate mean) from the $i\mr {th}$ imputed data set, i = 1, 2, …, m. Then the combined point estimate for $\mb {Q}$ from the multiple imputation is the average of the m complete-data estimates:

$\overline{\mb {Q}} = \frac{1}{m} \sum _{i=1}^{m} \hat{\mb {Q}_ i}$

Suppose that $\overline{\mb {W}}$ is the within-imputation covariance matrix, which is the average of the m complete-data estimates:

$\overline{\mb {W}} = \frac{1}{m} \sum _{i=1}^{m} \hat{\mb {W}_ i}$

And suppose that $\mb {B}$ is the between-imputation covariance matrix:

$\mb {B} = \frac{1}{m-1} \sum _{i=1}^{m} (\hat{\mb {Q}_ i}-\overline{\mb {Q}}) (\hat{\mb {Q}_ i}-\overline{\mb {Q}})’$

Then the covariance matrix associated with $\overline{\mb {Q}}$ is the total covariance matrix

$\mb {T}_{0} = \overline{\mb {W}} + (1+\frac{1}{m})\mb {B}$

The natural multivariate extension of the t statistic used in the univariate case is the F statistic

$F_{0} = (\mb {Q}-\overline{\mb {Q}})’ \mb {T}_{0}^{-1} (\mb {Q}-\overline{\mb {Q}})$

with degrees of freedom p and

$v=(m-1)(1+1/r)^{2}$

where

$r = (1+\frac{1}{m}) \, \mr {trace} (\mb {B} \overline{\mb {W}}^{-1}) / p$

is an average relative increase in variance due to nonresponse (Rubin 1987, p. 137; Schafer 1997, p. 114).

However, the reference distribution of the statistic $F_{0}$ is not easily derived. Especially for small m, the between-imputation covariance matrix $\mb {B}$ is unstable and does not have full rank for $m \le p$ (Schafer, 1997, p. 113).

One solution is to make an additional assumption that the population between-imputation and within-imputation covariance matrices are proportional to each other (Schafer, 1997, p. 113). This assumption implies that the fractions of missing information for all components of $\mb {Q}$ are equal. Under this assumption, a more stable estimate of the total covariance matrix is

$\mb {T} = (1+r) \overline{\mb {W}}$

With the total covariance matrix $\mb {T}$ , the F statistic (Rubin, 1987, p. 137)

$F = (\mb {Q}-\overline{\mb {Q}})’ \mb {T}^{-1} (\mb {Q}-\overline{\mb {Q}}) / p$

has an F distribution with degrees of freedom p and $v_{1}$ , where

$v_{1} = \frac{1}{2} (p+1) (m-1) (1+\frac{1}{r})^{2}$

For $t=p(m-1) \leq 4$ , PROC MIANALYZE uses the degrees of freedom $v_{1}$ in the analysis. For $t=p(m-1) > 4$ , PROC MIANALYZE uses $v_{2}$ , a better approximation of the degrees of freedom given by Li, Raghunathan, and Rubin (1991):

$v_{2} = 4 + (t-4) \left[ 1+ \frac{1}{r} (1-\frac{2}{t}) \right]^{2}$