Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The MI Procedure

Regression Method for Monotone Missing Data

A data set with variables Y1, Y2, ..., Yp (in that order) is said to have a monotone missing pattern when the event that a variable Yj is observed for a particular individual implies that all previous variables Yk, k < j, are also observed for that individual.

In the regression method, a regression model is fitted for each variable with missing values, with the previous variables as covariates. Based on the fitted regression coefficients, a new regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable (Rubin 1987, pp. 166 -167). The process is repeated sequentially for variables with missing values. That is, for a variable Yj with missing values, a model

Y_{j}={\beta}_{0} + {\beta}_{1} \, Y_{1} + {\beta}_{2} \, Y_{2} + ... + {\beta}_{j-1} \, Y_{j-1}

is fitted using observations with observed values for variables Y1, Y2, ..., Yj.

The fitted model includes the regression parameter estimates \hat{\beta}=(\hat{\beta}_{0}, \hat{\beta}_{1}, ... , \hat{\beta}_{j-1})and the associated covariance matrix \hat{\sigma}_{j}^2 V_{j},where Vj is the usual X'X inverse matrix derived from the intercept and variables Y1, Y2, ... , Yj-1.

For each imputation, new parameters {\beta}_{*}=({\beta}_{*0}, {\beta}_{*1}, ... , {\beta}_{*(j-1)}) and {\sigma}_{*j}^2 are drawn from the posterior predictive distribution of the parameters. That is, they are simulated from (\hat{\beta}_{0}, \hat{\beta}_{1}, ... , \hat{\beta}_{j-1}), {\sigma}_{j}^2, and Vj. The variance is drawn as

{\sigma}_{*j}^2=\hat{\sigma}_{j}^2 (n_{j}-j) / g
where g is a {\chi}_{n_{j}-j}^2 random variate and nj is the number of nonmissing observations for Yj. The regression coefficients are drawn as
{\beta}_{*}=\hat{\beta} + {\sigma}_{*j} V_{hj}' Z
where Vhj' is the upper triangular matrix in the Cholesky decomposition, Vj = Vhj' Vhj, and Z is a vector of j independent random normal variates.

The missing values are then replaced by

{\beta}_{*0} + {\beta}_{*1} \, y_{1} + {\beta}_{*2} \, y_{2} + ... + {\beta}_{*(j-1)} \, y_{j-1} + z_{i} \, {\sigma}_{*j}
where y1, y2, ... , yj-1 are the covariate values of the first j-1 variables and zi is a simulated normal deviate.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.