Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The MI Procedure

MCMC Statement

MCMC < options > ;
The MCMC statement specifies the details of the MCMC method for imputation. The following table summarizes the options available for the MCMC statement.

Table 9.2: Summary of Options in MCMC
Tasks Options
Specify data sets  
 input parameter estimates for imputations INEST=
 output parameter estimates used in imputations OUTEST=
 output parameter estimates used in iterations OUTITER=
Specify imputation details  
 monotone/full imputation IMPUTE=
 single/multiple chain CHAIN=
 number of burn-in iterations for each chain NBITER=
 number of iterations between imputations in a chain NITER=
 initial parameter estimates for MCMC INITIAL=
 prior parameter information PRIOR=
 starting parameters START=
Specify output graphics  
 displays time-series plots TIMEPLOT=
 displays autocorrelation plots ACFPLOT=
 graphics catalog name for saving graphics output GOUT=
Control printed output  
 displays worst linear function WLF
 displays initial parameter values for MCMC DISPLAYINIT


The following are the options available for the MCMC statement (in alphabetical order):

ACFPLOT < ( options < / display-options > ) >
displays the autocorrelation function plots of parameters from iterations.

The available options are:

COV < ( < variables > < variable1*variable2 > < ... variable1*variable2 > ) >
displays plots of variances for variables in the list and covariances for pairs of variables in the list. When the option COV is specified without variables, variances for all variables and covariances for all pairs of variables are used.

MEAN < ( variables ) >
displays plots of means for variables in the list. When the option MEAN is specified without variables, all variables are used.

WLF
displays the plot for the worst linear function.

When the ACFPLOT is specified without the preceding options, the procedure displays plots of means for all variables that are used.

The display-options provide additional information for the autocorrelation function plots. The available display-options are:

CCONF=color
specifies the color of the displayed confidence limits. The default is CCONF=BLACK.

CFRAME=color
specifies the color for filling the area enclosed by the axes and the frame. By default, this area is not filled.

CNEEDLES=color
specifies the color of the vertical line segments (needles) that connect autocorrelations to the reference line. The default is CNEEDLES=BLACK.

CREF=color
specifies the color of the displayed reference line. The default is CREF=BLACK.

CSYMBOL=color
specifies the color of the displayed data points. The default is CSYMBOL=BLACK.

HSYMBOL=number
specifies the height for data points in percentage screen units. The default is HSYMBOL=1.

LCONF=linetype
specifies the line type for the displayed confidence limits. The default is LREF=1, a solid line.

LOG
requests that the logarithmic transformations of parameters be used to compute the autocorrelations. It's generally used for the variances of variables. When a parameter has values less than or equal to zero, the corresponding plot is not created.

LREF=linetype
specifies the line type for the displayed reference line. The default is LREF=3, a dashed line.

NLAG=number
specifies the maximum lag of the series. The default is NLAG=20. The autocorrelations at each lag are displayed in the graph.

SYMBOL=value
specifies the symbol for data points in percentage screen units. The default is SYMBOL=STAR.

TITLE='string'
specifies the title to be displayed in the autocorrelation function plots. The default is TITLE='Autocorrelation Plot'.

WCONF=number
specifies the width for the displayed confidence limits in percentage screen units. If you specify the WCONF=0 option, the confidence limits are not displayed. The default is WCONF=1.

WNEEDLES=number
specifies the width for the displayed needles that connect autocorrelations to the reference line in percentage screen units. If you specify the WNEEDLES=0 option, the needles are not displayed. The default is WNEEDLES=1.

WREF=number
specifies the width for the displayed reference line in percentage screen units. If you specify the WREF=0 option, the reference line is not displayed. The default is WREF=1.

For example, the statement

   acfplot( mean( y1) cov(y1) /log);
requests autocorrelation function plots for the means and variances of the variable y1, respectively. Logarithmic transformations of both the means and variances are used in the plots. For a detailed description of the autocorrelation function plot, see the "Autocorrelation Function Plot" section; refer also to Schafer (1997, pp. 120-126) and the SAS/ETS User's Guide, Version 8.

CHAIN=SINGLE | MULTIPLE
specifies whether a single chain is used for all imputations or a separate chain is used for each imputation. The default is CHAIN=SINGLE.

DISPLAYINIT
displays initial parameter values in the MCMC process for each imputation.

GOUT=graphics-catalog
specifies the graphics catalog for saving graphics output from PROC MI. The default is WORK.GSEG. For more information, refer to the chapter "The GREPLAY Procedure" in SAS/GRAPH Software: Reference, Version 8.

IMPUTE=FULL | MONOTONE
specifies whether a full-data imputation is used for all missing values or a monotone-data imputation is used for a subset of missing values to make the imputed data sets have a monotone missing pattern. The default is IMPUTE=FULL. When IMPUTE=MONOTONE is specified, the order in the VAR statement is used to complete the monotone pattern.

INEST=SAS-data-set
names a SAS data set of TYPE=EST containing parameter estimates for imputations. These estimates are used to impute values for observations in the DATA= data set. A detailed description of the data set is provided in the "Input Data Sets" section.

INITIAL=EM <( options )>
INITIAL=INPUT=SAS-data-set
specifies the initial mean and covariance estimates for the MCMC process. The default is INITIAL=EM.

You can specify INITIAL=INPUT=SAS-data-set to read the initial estimates of the mean and covariance matrix for each imputation from a SAS data set. See the "Input Data Sets" section for a description of this data set.

With INITIAL=EM, PROC MI derives parameter estimates for a posterior mode, the highest observed-data posterior density, from the EM algorithm. The MLE from EM is used to start the EM algorithm for the posterior mode, and the resulting EM estimates are used to begin the MCMC process.

The following four options are available with INITIAL=EM.

BOOTSTRAP < =number >
requests bootstrap resampling, which uses a simple random sample with replacement from the input data set for the initial estimate. You can explicitly specify the number of observations in the random sample. Alternatively, you can implicitly specify the number of observations in the random sample by specifying the proportion p, 0<p<=1, to request [np] observations in the random sample, where n is the number of observations in the data set and [np] is the integer part of np. This produces an overdispersed initial estimate that provides different starting values for the MCMC process. If you specify the BOOTSTRAP option without the number, p=0.75 is used by default.

CONVERGE=p
sets the convergence criterion. The value must be between 0 and 1. The iterations are considered to have converged when the maximum change in the parameter estimates between iteration steps is less than the value specified. The change is a relative change if the parameter is greater than 0.01 in absolute value; otherwise, it is an absolute change. By default, CONVERGE=1E-4.

ITPRINT
prints the iteration history in the EM algorithm for the posterior mode.

MAXITER=number
specifies the maximum number of iterations used in the EM algorithm. The default is MAXITER=200.

NBITER=number
specifies the number of burn-in iterations before the first imputation in each chain. The default is NBITER=200.

NITER=number
specifies the number of iterations between imputations in a single chain. The default is NITER=100.

OUTEST=SAS-data-set
creates an output SAS data set of TYPE=EST. The data set contains parameter estimates used in each imputation. The data set also includes a variable named _Imputation_ to identify the imputation number. See the "Output Data Sets" section for a description of this data set.

OUTITER < ( options ) > =SAS-data-set
creates an output SAS data set of TYPE=COV containing parameters used in the imputation step for each iteration. The data set includes variables named _Imputation_ and _Iteration_ to identify the imputation number and iteration number.

The parameters in the output data set depend on the options specified. You can specify options MEAN, STD, COV, LR, LR_POST, and WLF to output parameters of means, standard deviations, covariances, -2 log LR statistic, -2 log LR statistic of the posterior mode, and the worst linear function. When no options are specified, the output data set contains the mean parameters used in the imputation step for each iteration. See the "Output Data Sets" section for a description of this data set.

PRIOR=name
specifies the prior information for the means and covariances. Valid values for name are as follows:

JEFFREYS
specifies a noninformative prior.

RIDGE=number
specifies a ridge prior.

INPUT=SAS-data-set
specifies a data set containing prior information.

For a detailed description of the prior information, see the "Bayesian Estimation of the Mean Vector and Covariance Matrix" section and the "Posterior Step" section. If you do not specify the PRIOR= option, the default is PRIOR=JEFFREYS.

The PRIOR=INPUT= option specifies a TYPE=COV data set from which the prior information of the mean vector and the covariance matrix is read. See the "Input Data Sets" section for a description of this data set.

START=VALUE | DIST
specifies that the initial parameter estimates are used as either the starting value (START=VALUE) or as the starting distribution (START=DIST) in the first imputation step of each chain. The default is START=VALUE.

TIMEPLOT < ( options < / display-options > ) >
displays the time-series plots of parameters from iterations.

The available options are:

COV < ( < variables > < variable1*variable2 > < ... variable1*variable2 > ) >
displays plots of variances for variables in the list and covariances for pairs of variables in the list. When the option COV is specified without variables, variances for all variables and covariances for all pairs of variables are used.

MEAN < ( variables ) >
displays plots of means for variables in the list. When the option MEAN is specified without variables, all variables are used.

WLF
displays the plot for the worst linear function.

When the TIMEPLOT is specified without the preceding options, the procedure displays plots of means for all variables are used.

The display-options provide additional information for the time-series plots. The available display-options are:

CFRAME=color
specifies the color for filling the area enclosed by the axes and the frame. By default, this area is not filled.

CSYMBOL=color
specifies the color of the data points to be displayed in the time-series plots. The default is CSYMBOL=BLACK.

HSYMBOL=number
specifies the height for data points in percentage screen units. The default is HSYMBOL=1.

LOG
requests that the logarithmic transformations of parameters be used. It's generally used for the variances of variables. When a parameter value is less than or equal to zero, the value is not displayed in the corresponding plot.

SYMBOL=value
specifies the symbol for data points in percentage screen units. The default is SYMBOL=PLUS.

TITLE='string'
specifies the title to be displayed in the time-series plots. The default is TITLE='Time-series Plot for Iterations'.

For a detailed description of the time-series plot, see the "Time-Series Plot" section and Schafer (1997, pp. 120 -126).

WLF
displays the worst linear function of parameters. This scalar function of parameters mu and {\Sigma}is "worst" in the sense that its values from iterations converge most slowly among parameters. For a detailed description of this statistic, see the "Worst Linear Function of Parameters" section.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.