The MI Procedure

Combining Inferences from Multiply Imputed Data Sets

With $\text{[math]}$ imputations, $\text{[math]}$ different sets of the point and variance estimates for a parameter $\text{[math]}$ can be computed. Suppose $\text{[math]}$ and $\text{[math]}$ are the point and variance estimates from the $\text{[math]}$ th imputed data set, $\text{[math]}$ = 1, 2, ..., $\text{[math]}$ . Then the combined point estimate for $\text{[math]}$ from multiple imputation is the average of the $\text{[math]}$ complete-data estimates:

$\text{[math]}$

Suppose $\text{[math]}$ is the within-imputation variance, which is the average of the $\text{[math]}$ complete-data estimates,

$\text{[math]}$

and $\text{[math]}$ is the between-imputation variance

$\text{[math]}$

Then the variance estimate associated with $\text{[math]}$ is the total variance (Rubin 1987)

$\text{[math]}$

The statistic $\text{[math]}$ is approximately distributed as $\text{[math]}$ with $\text{[math]}$ degrees of freedom (Rubin 1987), where

$\text{[math]}$

The degrees of freedom $\text{[math]}$ depend on $\text{[math]}$ and the ratio

$\text{[math]}$

The ratio $\text{[math]}$ is called the relative increase in variance due to nonresponse (Rubin 1987). When there is no missing information about $\text{[math]}$ , the values of $\text{[math]}$ and $\text{[math]}$ are both zero. With a large value of $\text{[math]}$ or a small value of $\text{[math]}$ , the degrees of freedom $\text{[math]}$ will be large and the distribution of $\text{[math]}$ will be approximately normal.

Another useful statistic is the fraction of missing information about $\text{[math]}$ :

$\text{[math]}$

Both statistics $\text{[math]}$ and $\text{[math]}$ are helpful diagnostics for assessing how the missing data contribute to the uncertainty about $\text{[math]}$ .

When the complete-data degrees of freedom $\text{[math]}$ are small, and there is only a modest proportion of missing data, the computed degrees of freedom, $\text{[math]}$ , can be much larger than $\text{[math]}$ , which is inappropriate. For example, with $\text{[math]}$ and $\text{[math]}$ , the computed degrees of freedom $\text{[math]}$ , which is inappropriate for data sets with complete-data degrees of freedom less than $\text{[math]}$ .

Barnard and Rubin (1999) recommend the use of adjusted degrees of freedom

$\text{[math]}$

where $\text{[math]}$ and $\text{[math]}$ .

Note that the MI procedure uses the adjusted degrees of freedom, $\text{[math]}$ , for inference.