Combining Inferences from Imputed Data Sets |
With imputations, different sets of the point and variance estimates for a parameter can be computed. Suppose that and are the point and variance estimates, respectively, from the th imputed data set, = 1, 2, ..., . Then the combined point estimate for from multiple imputation is the average of the complete-data estimates:
Suppose that is the within-imputation variance, which is the average of the complete-data estimates:
And suppose that is the between-imputation variance:
Then the variance estimate associated with is the total variance (Rubin 1987)
The statistic is approximately distributed as with degrees of freedom (Rubin 1987), where
The degrees of freedom depend on and the ratio
The ratio is called the relative increase in variance due to nonresponse (Rubin 1987). When there is no missing information about , the values of and are both zero. With a large value of or a small value of , the degrees of freedom will be large and the distribution of will be approximately normal.
Another useful statistic is the fraction of missing information about :
Both statistics and are helpful diagnostics for assessing how the missing data contribute to the uncertainty about .
When the complete-data degrees of freedom are small, and there is only a modest proportion of missing data, the computed degrees of freedom, , can be much larger than , which is inappropriate. For example, with and , the computed degrees of freedom , which is inappropriate for data sets with complete-data degrees of freedom less than .
Barnard and Rubin (1999) recommend the use of adjusted degrees of freedom
where and .
If you specify the complete-data degrees of freedom with the EDF= option, the MIANALYZE procedure uses the adjusted degrees of freedom, , for inference. Otherwise, the degrees of freedom are used.