# The MI Procedure

### Combining Inferences from Multiply Imputed Data Sets

With m imputations, m different sets of the point and variance estimates for a parameter Q can be computed. Suppose and are the point and variance estimates from the ith imputed data set, i = 1, 2, …, m. Then the combined point estimate for Q from multiple imputation is the average of the m complete-data estimates:

Suppose is the within-imputation variance, which is the average of the m complete-data estimates,

and B is the between-imputation variance

Then the variance estimate associated with is the total variance (Rubin 1987)

The statistic is approximately distributed as t with degrees of freedom (Rubin 1987), where

The degrees of freedom depend on m and the ratio

The ratio r is called the relative increase in variance due to nonresponse (Rubin 1987). When there is no missing information about Q, the values of r and B are both zero. With a large value of m or a small value of r, the degrees of freedom will be large and the distribution of will be approximately normal.

Another useful statistic is the fraction of missing information about Q:

Both statistics r and are helpful diagnostics for assessing how the missing data contribute to the uncertainty about Q.

When the complete-data degrees of freedom are small, and there is only a modest proportion of missing data, the computed degrees of freedom, , can be much larger than , which is inappropriate. For example, with m = 5 and r = 10%, the computed degrees of freedom , which is inappropriate for data sets with complete-data degrees of freedom less than 484.

Barnard and Rubin (1999) recommend the use of adjusted degrees of freedom

where     and   .

Note that the MI procedure uses the adjusted degrees of freedom, , for inference.