Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The MI Procedure

Producing Monotone Missingness with the MCMC Method

The monotone data MCMC method was first proposed by Li (1988), and Liu (1993) described the algorithm. The method is useful especially when a data set is close to having a monotone missing pattern. In this case, the method only needs to impute a few missing values to the data set to have a monotone missing pattern in the imputed data set. Compared to a full data imputation that imputes all missing values, the monotone data MCMC method imputes fewer missing values in each iteration and achieves approximate stationarity in fewer iterations (Schafer 1997, p. 227).

You can request the monotone MCMC method by specifying the option IMPUTE=MONOTONE in the MCMC statement. The "Missing Data Patterns" table now denotes the variables with missing values by "." or "O". A "." means that the variable is missing and will be imputed and an "O" means that the variable is missing and will not be imputed. The tables of "Multiple Imputation Variance Information" and "Multiple Imputation Parameter Estimates" are not created.

You must specify the variables in the VAR statement. The variable order in the list determines the monotone missing pattern in the imputed data set. With a different order in the VAR list, the results will be different because the monotone missing pattern to be constructed will be different.

Assuming that the data are from a multivariate normal distribution, then similar to the MCMC method, the monotone MCMC method repeats the following steps:

1. The imputation I-step: Given an estimated mean vector and covariance matrix, the I-step simulates the missing values for each observation independently. Only a subset of missing values are simulated to achieve a monotone pattern of missingness.

2. The posterior P-step: Given a new sample with a monotone pattern of missingness, the P-step simulates the posterior population mean vector and covariance matrix with a noninformative Jeffreys prior. These new estimates are then used in the next I-step.

Imputation Step

The I-step is almost identical to the I-step described in the "MCMC Method for Arbitrary Missing Data" section except that here only a subset of missing values need to be simulated. To state this precisely, denote the variables with observed values for observation i by Yi(obs) and the variables with missing values by Yi(mis) = ( Yi(m1) Yi(m2)), where Yi(m1) is a subset of the the missing variables that will result a monotone missingness when their values are imputed. Then the I-step draws values for Yi(m1) from a conditional distribution for Yi(m1) given Yi(obs).

Posterior Step

The P-step is different from the P-step described in the "MCMC Method for Arbitrary Missing Data" section. Instead of simulating the mu and {\Sigma} parameters from the full imputed data set, the P-step here simulates the mu and {\Sigma}parameters through simulated regression coefficients from regression models based on the imputed data set with a monotone pattern of missingness. The step is similar to the process described in the "Regression Method for Monotone Missing Data" section.

That is, for the variable Yj, a model

Y_{j}={\beta}_{0} + {\beta}_{1} \, Y_{1} + {\beta}_{2} \, Y_{2} + ... + {\beta}_{j-1} \, Y_{j-1}

is fitted using nonmissing observations.

The fitted model consists of the regression parameter estimates \hat{{\beta}}=(\hat{\beta}_{0}, \hat{\beta}_{1}, ... , \hat{\beta}_{j-1})and the associated covariance matrix \hat{\sigma}_{j}^2 V_{j},where Vj is the usual X'X inverse matrix from the intercept and variables Y1, Y2, ... , Yj-1.

For each imputation, new parameters {{\beta}}_{*}=({\beta}_{*0}, {\beta}_{*1}, ... , {\beta}_{*(j-1)}) and {\sigma}_{*j}^2 are drawn from the posterior predictive distribution of the parameters. That is, they are simulated from (\hat{\beta}_{0}, \hat{\beta}_{1}, ... , \hat{\beta}_{j-1}), {\sigma}_{j}^2, and Vj. The variance is drawn as

{\sigma}_{*j}^2=\hat{\sigma}_{j}^2 (n_{j}-j) / g
where g is a {\chi}_{n_{j}-p+j-1}^2 random variate and nj is the number of nonmissing observations for Yj. The regression coefficients are drawn as
{{\beta}}_{*}=\hat{{\beta}} + {\sigma}_{*j} V_{hj}' Z
where Vhj' is the upper triangular matrix in the Cholesky decomposition Vj = Vhj' Vhj and Z is a vector of j independent random normal variates.

These simulated values of {{\beta}}_{*} and {\sigma}_{*j}^2are then used to re-create the parameters mu and {\Sigma}. For a detailed description of how to produce monotone-missingness with the MCMC method for a multivariate normal data, refer to Schafer (1997, pp. 226 -235).

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.