The MI Procedure

Monotone Methods for Data Sets with Monotone Missing Patterns

For data sets with monotone missing data patterns, you can use monotone methods to impute missing values for the variables. A monotone method creates multiple imputations by imputing missing values sequentially over the variables taken one at a time.

For example, with variables $Y_{1}$, $Y_{2}$, …, $Y_{p}$ (in that order) in the VAR statement, a monotone method sequentially simulates a draw for missing values for variables $Y_{2}$, …, $Y_{p}$. That is, the missing values are imputed by using the sequence

\begin{eqnarray*} \btheta ^{(*)}_{2} & \sim & P(\, \btheta _{2} \, |\, Y_{1(\mi{obs})}, Y_{2(\mi{obs})}) \\ Y^{(*)}_{2} & \sim & P(\, Y_{2} \, |\, \btheta ^{(*)}_{2}) \\ & \ldots & \\ & \ldots & \\ \btheta ^{(*)}_{p} & \sim & P(\, \btheta _{p} \, |\, Y_{1(\mi{obs})}, \ldots , Y_{p(\mi{obs})} \, ) \\ Y^{(*)}_{p} & \sim & P(\, Y_{p} \, |\, \btheta ^{(*)}_{p} \, ) \end{eqnarray*}

where $Y_{j(\mi{obs})}$ is the set of observed $Y_ j$ values, $\btheta ^{(*)}_{j}$ is the set of simulated parameters for the conditional distribution of $Y_{j}$ given covariates constructed from variables $Y_{1}$, $Y_{2}$, …, $Y_{j-1}$, and $Y^{(*)}_{j}$ is the set of imputed $Y_ j$ values.

The missing values for the leading variable $Y_{1}$ are not imputed, and missing values for $Y_{2}$, …, $Y_{p}$ are not imputed for those observations with missing $Y_{1}$ values. For each subsequent variable $Y_{j}$ with missing values, the corresponding imputation method is used to fit a model with covariates constructed from its preceding variables $Y_{1}, Y_{2}, \ldots , Y_{j-1}$. The observed observations for $Y_{j}$, which include only observations with observed values for $Y_{1}, Y_{2}, \ldots , Y_{j-1}$, are used in the model fitting. With this resulting model, a new model is drawn and then used to impute missing values for $Y_{j}$.

You can specify a separate monotone method for each imputed variable. If a method is not specified for the variable, then the default method is used. That is, a regression method is used for a continuous variable and a discriminant function method is used for a classification variable. For each imputed variable, you can also specify a set of covariates that are constructed from its preceding variables. If a set of covariates is not specified for the variable, all preceding variables in the VAR list are used as covariates.

You can use a regression method, a predictive mean matching method, or a propensity score method to impute missing values for a continuous variable; a logistic regression method for a classification variable with a binary or ordinal response; and a discriminant function method for a classification variable with a binary or nominal response. See the sections Monotone and FCS Regression Methods, Monotone and FCS Predictive Mean Matching Methods, Monotone Propensity Score Method, Monotone and FCS Discriminant Function Methods, and Monotone and FCS Logistic Regression Methods for these methods.