# The MI Procedure

### Monotone and FCS Logistic Regression Methods

Subsections:

The logistic regression method is another imputation method available for classification variables. In the logistic regression method, a logistic regression model is fitted for a classification variable with a set of covariates constructed from the effects, where the classification variable is an ordinal response or a nominal response variable.

In the MI procedure, ordered values are assigned to response levels in ascending sorted order. If the response variable Y takes values in , then for ordinal response models, the cumulative model has the form where are K-1 intercept parameters, and is the vector of slope parameters.

For nominal response logistic models, where the K possible responses have no natural ordering, the generalized logit model has the form where the are K-1 intercept parameters, and the are K-1 vectors of slope parameters.

#### Binary Response Logistic Regression

For a binary classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable (Rubin, 1987, pp. 167–170).

For a binary variable Y with responses 1 and 2, a logistic regression model is fitted using observations with observed values for the imputed variable Y: where are covariates for Y, ,   and The fitted model includes the regression parameter estimates and the associated covariance matrix .

The following steps are used to generate imputed values for a binary variable Y with responses 1 and 2:

1. New parameters are drawn from the posterior predictive distribution of the parameters. where is the upper triangular matrix in the Cholesky decomposition, , and is a vector of independent random normal variates.

2. For an observation with missing and covariates , compute the predicted probability that Y= 1: where .

3. Draw a random uniform variate, u, between 0 and 1. If the value of u is less than , impute Y= 1; otherwise impute Y= 2.

The binary logistic regression imputation method can be extended to include the ordinal classification variables with more than two levels of responses, and the nominal classification variables. The LINK=LOGIT and LINK=GLOGIT options can be used to specify the cumulative logit model and the generalized logit model, respectively. The options ORDER= and DESCENDING can be used to specify the sort order for the levels of the imputed variables.

#### Ordinal Response Logistic Regression

For an ordinal classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable.

For a variable Y with ordinal responses 1, 2, …, K, a logistic regression model is fitted using observations with observed values for the imputed variable Y: where are covariates for Y and .

The fitted model includes the regression parameter estimates and , and their associated covariance matrix .

The following steps are used to generate imputed values for an ordinal classification variable Y with responses 1, 2, …, K:

1. New parameters are drawn from the posterior predictive distribution of the parameters. where , is the upper triangular matrix in the Cholesky decomposition, , and is a vector of independent random normal variates.

2. For an observation with missing Y and covariates , compute the predicted cumulative probability for : 3. Draw a random uniform variate, u, between 0 and 1, then impute #### Nominal Response Logistic Regression

For a nominal classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable.

For a variable Y with nominal responses 1, 2, …, K, a logistic regression model is fitted using observations with observed values for the imputed variable Y: where are covariates for Y and .

The fitted model includes the regression parameter estimates and , and their associated covariance matrix , where ,

The following steps are used to generate imputed values for a nominal classification variable Y with responses 1, 2, …, K:

1. New parameters are drawn from the posterior predictive distribution of the parameters. where , is the upper triangular matrix in the Cholesky decomposition, , and is a vector of independent random normal variates.

2. For an observation with missing Y and covariates , compute the predicted probability for Y= j, j=1, 2, …, K-1: and 3. Compute the cumulative probability for : 4. Draw a random uniform variate, u, between 0 and 1, then impute 