The MI Procedure

Monotone and FCS Logistic Regression Methods

The logistic regression method is another imputation method available for classification variables. In the logistic regression method, a logistic regression model is fitted for a classification variable with a set of covariates constructed from the effects. For a binary classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable (Rubin, 1987, pp. 167–170).

For a binary variable $Y_{j}$ with responses 1 and 2, a logistic regression model is fitted using observations with observed values for the imputed variable $Y_{j}$ and its covariates $X_{1}$ , $X_{2}$ , …, $X_{k}$ :

$\mr {logit} (p_{j}) = {\beta }_{0} + {\beta }_{1} \, X_{1} + {\beta }_{2} \, X_{2} + \ldots + {\beta }_{k} \, X_{k}$

where $X_{1}, X_{2}, \ldots , X_{k}$ are covariates for $Y_{j}$ , $p_{j} = \mr {Pr}( R_{j}=1 | X_{1}, X_{2}, \ldots , X_{k} )$ , and $\mr {logit} (p) = \mr {log} ( p / (1-p) ).$

The fitted model includes the regression parameter estimates $\hat{\bbeta } = (\hat{\beta }_{0}, \hat{\beta }_{1}, \ldots , \hat{\beta }_{k})$ and the associated covariance matrix $\mb {V}_{j}$ .

The following steps are used to generate imputed values for a binary variable $Y_{j}$ with responses 1 and 2:

New parameters $\bbeta _{*} = ({\beta }_{*0}, {\beta }_{*1}, \ldots , {\beta }_{*(k)})$ are drawn from the posterior predictive distribution of the parameters.

$\bbeta _{*} = \hat{\bbeta } + \mb {V}_{hj}’ \mb {Z}$

where $\mb {V}_{hj}$ is the upper triangular matrix in the Cholesky decomposition, $\mb {V}_{j} = \mb {V}_{hj}’ \mb {V}_{hj}$ , and $\mb {Z}$ is a vector of independent random normal variates.

For an observation with missing $Y_{j}$ and covariates $x_{1}, x_{2}, \ldots , x_{k}$ , compute the expected probability that $Y_{j}= 1$ :

$p_{j} = \frac{\mr {exp}({\mu }_ j)}{1+\mr {exp}({\mu }_ j)}$

where ${\mu }_ j = {\beta }_{*0} + {\beta }_{*1} \, x_{1} + {\beta }_{*2} \, x_{2} + \ldots + {\beta }_{*(k)} \, x_{k}$ .

Draw a random uniform variate, u, between 0 and 1. If the value of u is less than $p_{j}$ , impute $Y_{j}= 1$ ; otherwise impute $Y_{j}= 2$ .

The preceding logistic regression method can be extended to include the ordinal classification variables with more than two levels of responses. The options ORDER= and DESCENDING can be used to specify the sort order for the levels of the imputed variables.