The MI Procedure

Monotone and FCS Logistic Regression Methods

The logistic regression method is another imputation method available for classification variables. In the logistic regression method, a logistic regression model is fitted for a classification variable with a set of covariates constructed from the effects. For a binary classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable (Rubin, 1987, pp. 167–170).

For a binary variable $Y_{j}$ with responses 1 and 2, a logistic regression model is fitted using observations with observed values for the imputed variable $Y_{j}$ and its covariates $X_{1}$, $X_{2}$, …, $X_{k}$:

\[  \mr {logit} (p_{j}) = {\beta }_{0} + {\beta }_{1} \,  X_{1} + {\beta }_{2} \,  X_{2} + \ldots + {\beta }_{k} \,  X_{k}  \]

where $X_{1}, X_{2}, \ldots , X_{k}$ are covariates for $Y_{j}$,   $p_{j} = \mr {Pr}( R_{j}=1 | X_{1}, X_{2}, \ldots , X_{k} )$,   and   $\mr {logit} (p) = \mr {log} ( p / (1-p) ).$

The fitted model includes the regression parameter estimates $\hat{\bbeta } = (\hat{\beta }_{0}, \hat{\beta }_{1}, \ldots , \hat{\beta }_{k})$ and the associated covariance matrix $\mb {V}_{j}$.

The following steps are used to generate imputed values for a binary variable $Y_{j}$ with responses 1 and 2:

  1. New parameters $\bbeta _{*} = ({\beta }_{*0}, {\beta }_{*1}, \ldots , {\beta }_{*(k)})$ are drawn from the posterior predictive distribution of the parameters.

    \[  \bbeta _{*} = \hat{\bbeta } + \mb {V}_{hj}’ \mb {Z}  \]

    where $\mb {V}_{hj}$ is the upper triangular matrix in the Cholesky decomposition, $\mb {V}_{j} = \mb {V}_{hj}’ \mb {V}_{hj}$, and $\mb {Z}$ is a vector of $k+1$ independent random normal variates.

  2. For an observation with missing $Y_{j}$ and covariates $x_{1}, x_{2}, \ldots , x_{k}$, compute the expected probability that $Y_{j}= 1$:

    \[  p_{j} = \frac{\mr {exp}({\mu }_ j)}{1+\mr {exp}({\mu }_ j)}  \]

    where ${\mu }_ j = {\beta }_{*0} + {\beta }_{*1} \,  x_{1} + {\beta }_{*2} \,  x_{2} + \ldots + {\beta }_{*(k)} \,  x_{k}$.

  3. Draw a random uniform variate, u, between 0 and 1. If the value of u is less than $p_{j}$, impute $Y_{j}= 1$; otherwise impute $Y_{j}= 2$.

The preceding logistic regression method can be extended to include the ordinal classification variables with more than two levels of responses. The options ORDER= and DESCENDING can be used to specify the sort order for the levels of the imputed variables.