The MI Procedure

Output Data Sets

You can specify the output data set of imputed values with the OUT= option in the PROC MI statement. When an EM statement is used, you can specify the data set that contains the original data set with missing values being replaced by the expected values from the EM algorithm by using the OUT= option in the EM statement. You can also specify the data set that contains MLE computed with the EM algorithm by using the OUTEM= option.

When an MCMC method is used, you can specify the data set that contains parameter estimates used in each imputation with the OUTEST= option in the MCMC statement, and you can specify the data set that contains parameters used in the imputation step for each iteration with the OUTITER option in the MCMC statement.

OUT=SAS-data-set in the PROC MI statement

The OUT= data set contains all the variables in the original data set and a new variable named _Imputation_ that identifies the imputation. For each imputation, the data set contains all variables in the input DATA= data set with missing values being replaced by imputed values. Note that when the NIMPUTE=1 option is specified, the variable _Imputation_ is not created.

OUT=SAS-data-set in an EM statement

The OUT= data set contains the original data set with missing values being replaced by expected values from the EM algorithm.

OUTEM=SAS-data-set

The OUTEM= data set is a TYPE=COV data set and contains the MLE computed with the EM algorithm. The observations with _TYPE_=‘MEAN’ contain the estimated mean and the observations with _TYPE_=‘COV’ contain the estimated covariances.

OUTEST=SAS-data-set

The OUTEST= data set is a TYPE=EST data set and contains parameter estimates used in each imputation in the MCMC method. It also includes an index variable named _Imputation_, which identifies the imputation.

The observations with _TYPE_=‘SEED’ contain the seed information for the random number generator. The observations with _TYPE_=‘PARM’ or _TYPE_=‘PARMS’ contain the point estimate, and the observations with _TYPE_=‘COV’ or _TYPE_=‘COVB’ contain the associated covariances. These estimates are used as the parameters of the reference distribution to impute values for observations in the DATA= dataset.

Note that these estimates are the values used in the I-step before each imputation. These are not the parameter values simulated from the P-step in the same iteration. See Example 57.12 for a usage of this option.

OUTITER <(options)> =SAS-data-set in an EM statement

The OUTITER= data set in an EM statement is a TYPE=COV data set and contains parameters for each iteration. It also includes a variable _Iteration_ that provides the iteration number.

The parameters in the output data set depend on the options specified. You can specify the MEAN and COV options for OUTITER. With the MEAN option, the output data set contains the mean parameters in observations with the variable _TYPE_=‘MEAN’. Similarly, with the COV option, the output data set contains the covariance parameters in observations with the variable _TYPE_=‘COV’. When no options are specified, the output data set contains the mean parameters for each iteration.

OUTITER <(options)> =SAS-data-set in an FCS statement

The OUTITER= data set in an FCS statement is a TYPE=COV data set and contains parameters for each iteration. It also includes variables named _Imputation_ and _Iteration_, which provide the imputation number and iteration number.

The parameters in the output data set depend on the options specified. You can specify the MEAN and STD options for OUTITER. With the MEAN option, the output data set contains the mean parameters used in the imputation in observations with the variable _TYPE_=‘MEAN’. Similarly, with the STD option, the output data set contains the standard deviation parameters used in the imputation in observations with the variable _TYPE_=‘STD’. When no options are specified, the output data set contains the mean parameters for each iteration.

OUTITER <(options)> =SAS-data-set in an MCMC statement

The OUTITER= data set in an MCMC statement is a TYPE=COV data set and contains parameters used in the imputation step for each iteration. It also includes variables named _Imputation_ and _Iteration_, which provide the imputation number and iteration number.

The parameters in the output data set depend on the options specified. Table 57.6 summarizes the options available for OUTITER and the corresponding values for the output variable _TYPE_.

Table 57.6: Summary of Options for OUTITER in an MCMC statement

Option

Output Parameters

_TYPE_

MEAN

mean parameters

MEAN

STD

standard deviations

STD

COV

covariances

COV

LR

–2 log LR statistic

LOG_LR

LR_POST

–2 log LR statistic of the posterior mode

LOG_POST

WLF

worst linear function

WLF


When no options are specified, the output data set contains the mean parameters used in the imputation step for each iteration. For a detailed description of the worst linear function and LR statistics, see the section Checking Convergence in MCMC.