|
Chapter Contents |
Previous |
Next |
| The MI Procedure |
In MCMC simulation, one constructs a Markov chain long enough for the distribution of the elements to stabilize to a stationary distribution, which is the distribution of interest. By repeatedly simulating steps of the chain, the method simulates draws from the distribution of interest. Refer to Schafer (1997) for a detailed discussion of this method.
In Bayesian inference, information about unknown parameters is expressed in the form of a posterior probability distribution. This posterior distribution is computed using Bayes' theorem

MCMC has been applied as a method for exploring posterior distributions in Bayesian inference. That is, through MCMC, one can simulate the entire joint posterior distribution of the unknown quantities and obtain simulation-based estimates of posterior parameters that are of interest.
In many incomplete data problems, the observed-data posterior
is intractable and cannot easily be simulated.
However, when Yobs is augmented by an estimated/simulated value
of the missing data Ymis, the complete-data posterior
is much easier to simulate.
Assuming that the data are from a multivariate
normal distribution, data augmentation can be applied to Bayesian
inference with missing data by repeating the following steps:
1. The imputation I-step: Given an estimated mean vector and covariance matrix, the I-step simulates the missing values for each observation independently. That is, if you denote the variables with missing values for observation i by Yi(mis) and the variables with observed values by Yi(obs), then the I-step draws values for Yi(mis) from a conditional distribution for Yi(mis) given Yi(obs).
2. The posterior P-step: Given a complete sample, the P-step simulates the posterior population mean vector and covariance matrix. These new estimates are then used in the next I-step. Without prior information about the parameters, a noninformative prior is used. You can also use other informative priors. For example, a prior information about the covariance matrix can be helpful to stabilize the inference about the mean vector for a near singular covariance matrix.
The two steps are iterated long enough for the results
to be reliable for a multiply imputed data set (Schafer 1997, p. 72).
That is, with a current parameter estimate
at the tth iteration, the I-step draws Ymis(t+1) from
and the P-step draws
from
.
This creates a Markov chain
,
, ... ,
which converges in distribution to
.Assuming the iterates converge to a stationary distribution,
the goal is to simulate an approximately independent draw
of the missing values from this distribution.
To validate the imputation results, you should repeat the process with different random number generators and starting values based on different initial parameter estimates.
The next three sections provide details for the imputation step, Bayesian estimation of the mean vector and covariance matrix, and the posterior step.
Suppose
is the partitioned mean vector
of two sets of variables, Yobs and Ymis,
where
is the mean vector
for variables Yobs and
is the mean vector for variables Ymis.
Also suppose
![{\Sigma} &=& [ {\Sigma}_{11} & {\Sigma}_{12} \ {\Sigma}_{12}' & {\Sigma}_{22} \ ]](images/mieq63.gif)
By using the sweep operator (Goodnight 1979)
on the pivots of the
submatrix,
the matrix becomes
![[ {\Sigma}_{11}^{-1} & {\Sigma}_{11}^{-1} {\Sigma}_{12} \ -{\Sigma}_{12}' {\Sigma}_{11}^{-1} & {\Sigma}_{22.1} \ ]](images/mieq67.gif)
where
can be used to compute the conditional covariance matrix
of Ymis after controlling for Yobs.
For an observation with the preceding missing pattern, the conditional distribution of Ymis given Yobs = y1 is a multivariate normal distribution with the mean vector



When each observation yi is distributed with a multivariate
normal distribution with an unknown mean
,then the CSSCP matrix

If A has a Wishart distribution
,then B = A-1 has an inverted Wishart distribution
, where n is the degrees of freedom and
is the precision matrix
(Anderson 1984).
Note that, instead of using the parameter
for the inverted Wishart distribution,
Schafer (1997) uses the parameter
.
Suppose that each observation in the data matrix Y
has a multivariate normal distribution with mean
and
covariance matrix
.Then with a prior inverted Wishart distribution for
and a prior normal distribution for ![]()


where (n-1) S is the CSSCP matrix.
You can specify the prior parameter information using one of the following methods:
The next four subsections provide details of the posterior step for different prior distributions.


To obtain the prior distribution for
,PROC MI reads the matrix S* from observations in the data set
with _TYPE_=`COV', and it reads n*=d*+1
from observations with _TYPE_=`N'.
To obtain the prior distribution for
,PROC MI reads the mean vector
from observations with
_TYPE_=`MEAN', and it reads n0 from observations with
_TYPE_=`N_MEAN'.
When there are no observations with _TYPE_=`N_MEAN',
PROC MI reads n0 from observations with _TYPE_=`N'.
The resulting posterior distribution, as described in the "Bayesian Estimation of the Mean Vector and Covariance Matrix" section, is given by


To obtain the prior distribution for
,PROC MI reads the matrix S* from observations in the data set
with _TYPE_=`COV', and it reads n* from observations
with _TYPE_=`N'.
Note that if the PRIOR=INPUT= data set also contains observations with
_TYPE_=`MEAN', then a complete informative prior
for both
and
will be used.
Corresponding to the prior for ![]()

the posterior distribution for
(Anderson 1984, p. 269) is

Thus, an estimate of
is given by the weighted average

You can request a ridge prior by using the PRIOR=RIDGE= option.
You can explicitly specify the number
in the
PRIOR=RIDGE=d* option. Or you can implicitly specify the number
by specifying the proportion p in the PRIOR=RIDGE=p option
to request d*= (n-1) p.
The posterior is then given by

|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.