|
Chapter Contents |
Previous |
Next |
| The MI Procedure |
Schafer (1997, pp. 139-143) provides comprehensive coverage of this topic, and the following discussion is largely based on his work.
Multiple imputation inference assumes that the model you used to analyze the multiply imputed data (the analyst's model) is the same as the model used to impute missing values in multiple imputation (the imputer's model). But in practice, the two models may not be the same.
For example, consider the same trivariate data set with variables Y1 and Y2 fully observed, and a variable Y3 with missing values. An imputer creates multiple imputations with the model Y3 = Y1 Y2. However, the analyst can later use the simpler model Y3=Y1. In this case, the analyst assumes more than the imputer. That is, the analyst assumes there is no relationship between variables Y3 and Y2.
The effect of the discrepancy between the models depends on whether the analyst's additional assumption is true. If the assumption is true, the imputer's model still applies. The inferences derived from multiple imputations will still be valid, although they may be somewhat conservative because they reflect the additional uncertainty of estimating the relationship between Y3 and Y2.
On the other hand, suppose that the analyst models Y3=Y1, and there is a relationship between variables Y3 and Y2. Then the model Y3 = Y1 will be biased and is inappropriate. Appropriate results can be generated only from appropriate analyst's models.
Another type of discrepancy occurs when the imputer assumes more than the analyst. For example, suppose that an imputer creates multiple imputations with the model Y3 = Y1, but the analyst later fits a model Y3 = Y1 Y2. When the assumption is true, the imputer's model is a correct model and the inferences still hold.
On the other hand, suppose there is a relationship between Y3 and Y2. Imputations created under the incorrect assumption that there is no relationship between Y3 and Y2 will make the analyst's estimate of the relationship biased toward zero. Multiple imputations created under an incorrect model can lead to incorrect conclusions.
Thus, generally you should include as many variables as you can when doing multiple imputation. The precision you lose when you include unimportant predictors is usually a relatively small price to pay for the general validity of analyses of the resultant multiply imputed data set (Rubin 1996).
Note that it is good practice to include a description of the imputer's model with the multiply imputed data set. That way, the analysts will have information about the variables involved in the imputation and which relationships among the variables have been implicitly set to zero.
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.