The MI Procedure |
Output Data Sets |
You can specify the output data set of imputed values with the OUT= option in the PROC MI statement. When an EM statement is used, you can specify the data set containing the original data set with missing values being replaced by the expected values from the EM algorithm by using the OUT= option in the EM statement. You can also specify the data set containing MLE computed with the EM algorithm by using the OUTEM= option.
When an MCMC method is used, you can specify the data set containing parameter estimates used in each imputation with the OUTEST= option in the MCMC statement, and you can specify the data set containing parameters used in the imputation step for each iteration with the OUTITER option in the MCMC statement.
The OUT= data set contains all the variables in the original data set and a new variable named _Imputation_ that identifies the imputation. For each imputation, the data set contains all variables in the input DATA= data set with missing values being replaced by imputed values. Note that when the NIMPUTE=1 option is specified, the variable _Imputation_ is not created.
The OUT= data set contains the original data set with missing values being replaced by expected values from the EM algorithm.
The OUTEM= data set is a TYPE=COV data set and contains the MLE computed with the EM algorithm. The observations with _TYPE_=‘MEAN’ contain the estimated mean and the observations with _TYPE_=‘COV’ contain the estimated covariances.
The OUTEST= data set is a TYPE=EST data set and contains parameter estimates used in each imputation in the MCMC method. It also includes an index variable named _Imputation_, which identifies the imputation.
The observations with _TYPE_=‘SEED’ contain the seed information for the random number generator. The observations with _TYPE_=‘PARM’ or _TYPE_=‘PARMS’ contain the point estimate, and the observations with _TYPE_=‘COV’ or _TYPE_=‘COVB’ contain the associated covariances. These estimates are used as the parameters of the reference distribution to impute values for observations in the DATA= dataset.
Note that these estimates are the values used in the I-step before each imputation. These are not the parameter values simulated from the P-step in the same iteration. See Example 54.9 for a usage of this option.
The OUTITER= data set in an EM statement is a TYPE=COV data set and contains parameters for each iteration. It also includes a variable _Iteration_ that provides the iteration number.
The parameters in the output data set depend on the options specified. You can specify the MEAN and COV options for OUTITER. With the MEAN option, the output data set contains the mean parameters in observations with the variable _TYPE_=‘MEAN’. Similarly, with the MEAN option, the output data set contains the covariance parameters in observations with the variable _TYPE_=‘COV’. When no options are specified, the output data set contains the mean parameters for each iteration.
The OUTITER= data set in an MCMC statement is a TYPE=COV data set and contains parameters used in the imputation step for each iteration. It also includes variables named _Imputation_ and _Iteration_, which provide the imputation number and iteration number.
The parameters in the output data set depend on the options specified. Table 54.4 summarizes the options available for OUTITER and the corresponding values for the output variable _TYPE_.
Option |
Output Parameters |
_TYPE_ |
---|---|---|
MEAN |
mean parameters |
MEAN |
STD |
standard deviations |
STD |
COV |
covariances |
COV |
LR |
2 log LR statistic |
LOG_LR |
LR_POST |
2 log LR statistic of the posterior mode |
LOG_POST |
WLF |
worst linear function |
WLF |
When no options are specified, the output data set contains the mean parameters used in the imputation step for each iteration. For a detailed description of the worst linear function and LR statistics, see the section Checking Convergence in MCMC.
Copyright © SAS Institute, Inc. All Rights Reserved.