Monotone and FCS Logistic Regression Methods

Subsections:

The logistic regression method is another imputation method available for classification variables. In the logistic regression method, a logistic regression model is fitted for a classification variable with a set of covariates constructed from the effects, where the classification variable is an ordinal response or a nominal response variable.

In the MI procedure, ordered values are assigned to response levels in ascending sorted order. If the response variable Y takes values in , then for ordinal response models, the cumulative model has the form

where are K-1 intercept parameters, and is the vector of slope parameters.

For nominal response logistic models, where the K possible responses have no natural ordering, the generalized logit model has the form

where the are K-1 intercept parameters, and the are K-1 vectors of slope parameters.

Binary Response Logistic Regression

For a binary classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable (Rubin, 1987, pp. 167–170).

For a binary variable Y with responses 1 and 2, a logistic regression model is fitted using observations with observed values for the imputed variable Y:

where are covariates for Y,   ,   and

The fitted model includes the regression parameter estimates and the associated covariance matrix .

The following steps are used to generate imputed values for a binary variable Y with responses 1 and 2:

1. New parameters are drawn from the posterior predictive distribution of the parameters.

where is the upper triangular matrix in the Cholesky decomposition, , and is a vector of independent random normal variates.

2. For an observation with missing and covariates , compute the predicted probability that Y= 1:

where .

3. Draw a random uniform variate, u, between 0 and 1. If the value of u is less than , impute Y= 1; otherwise impute Y= 2.

The binary logistic regression imputation method can be extended to include the ordinal classification variables with more than two levels of responses, and the nominal classification variables. The LINK=LOGIT and LINK=GLOGIT options can be used to specify the cumulative logit model and the generalized logit model, respectively. The options ORDER= and DESCENDING can be used to specify the sort order for the levels of the imputed variables.

Ordinal Response Logistic Regression

For an ordinal classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable.

For a variable Y with ordinal responses 1, 2, …, K, a logistic regression model is fitted using observations with observed values for the imputed variable Y:

where are covariates for Y and   .

The fitted model includes the regression parameter estimates and , and their associated covariance matrix .

The following steps are used to generate imputed values for an ordinal classification variable Y with responses 1, 2, …, K:

1. New parameters are drawn from the posterior predictive distribution of the parameters.

where , is the upper triangular matrix in the Cholesky decomposition, , and is a vector of independent random normal variates.

2. For an observation with missing Y and covariates , compute the predicted cumulative probability for :

3. Draw a random uniform variate, u, between 0 and 1, then impute

Nominal Response Logistic Regression

For a nominal classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable.

For a variable Y with nominal responses 1, 2, …, K, a logistic regression model is fitted using observations with observed values for the imputed variable Y:

where are covariates for Y and   .

The fitted model includes the regression parameter estimates and , and their associated covariance matrix , where ,

The following steps are used to generate imputed values for a nominal classification variable Y with responses 1, 2, …, K:

1. New parameters are drawn from the posterior predictive distribution of the parameters.

where , is the upper triangular matrix in the Cholesky decomposition, , and is a vector of independent random normal variates.

2. For an observation with missing Y and covariates , compute the predicted probability for Y= j, j=1, 2, …, K-1:

and

3. Compute the cumulative probability for :

4. Draw a random uniform variate, u, between 0 and 1, then impute