The MIANALYZE Procedure

Combining Inferences from Imputed Data Sets

With m imputations, m different sets of the point and variance estimates for a parameter Q can be computed. Suppose that $\hat{Q_ i}$ and $\hat{W_ i}$ are the point and variance estimates, respectively, from the ith imputed data set, i = 1, 2, …, m. Then the combined point estimate for Q from multiple imputation is the average of the m complete-data estimates:

\[  {\overline Q} = \frac{1}{m} \sum _{i=1}^{m} \hat{Q_ i}  \]

Suppose that ${\overline W}$ is the within-imputation variance, which is the average of the m complete-data estimates:

\[  {\overline W} = \frac{1}{m} \sum _{i=1}^{m} \hat{W_ i}  \]

And suppose that B is the between-imputation variance:

\[  B = \frac{1}{m-1} \sum _{i=1}^{m} (\hat{Q_ i}-{\overline Q})^2  \]

Then the variance estimate associated with ${\overline Q}$ is the total variance (Rubin, 1987)

\[  T = {\overline W} + (1+\frac{1}{m}) B  \]

The statistic $(Q-{\overline Q}) T^{-(1/2)}$ is approximately distributed as t with $v_{m}$ degrees of freedom (Rubin, 1987), where

\[  v_{m} = (m-1) {\left[ 1 + \frac{{\overline W}}{(1+m^{-1})B} \right]}^2  \]

The degrees of freedom $v_{m}$ depend on m and the ratio

\[  r = \frac{(1+m^{-1})B}{\overline W}  \]

The ratio r is called the relative increase in variance due to nonresponse (Rubin, 1987). When there is no missing information about Q, the values of r and B are both zero. With a large value of m or a small value of r, the degrees of freedom $v_{m}$ will be large and the distribution of $(Q-{\overline Q}) T^{-(1/2)}$ will be approximately normal.

Another useful statistic is the fraction of missing information about Q:

\[  \hat{\lambda } = \frac{r+2/(v_{m}+3)}{r+1}  \]

Both statistics r and $\lambda $ are helpful diagnostics for assessing how the missing data contribute to the uncertainty about Q.

When the complete-data degrees of freedom $v_{0}$ are small, and there is only a modest proportion of missing data, the computed degrees of freedom, $v_{m}$, can be much larger than $v_{0}$, which is inappropriate. For example, with m = 5 and r = 10%, the computed degrees of freedom $v_{m}=484$, which is inappropriate for data sets with complete-data degrees of freedom less than 484.

Barnard and Rubin (1999) recommend the use of adjusted degrees of freedom

\[  v_{m}^{*} = \,  \left[ \frac{1}{v_{m}} + \frac{1}{\hat{v}_{\mathit{obs}}} \right] ^{-1}  \]

where   $\hat{v}_{\mathit{obs}} = (1 - \gamma ) \,  v_{0} (v_{0}+1) / (v_{0}+3)$   and   $\gamma = (1+m^{-1}) B / T$.

If you specify the complete-data degrees of freedom $v_{0}$ with the EDF= option, the MIANALYZE procedure uses the adjusted degrees of freedom, $v_{m}^{*}$, for inference. Otherwise, the degrees of freedom $v_{m}$ are used.