The MI Procedure

Example 75.1 EM Algorithm for MLE

This example uses the EM algorithm to compute the maximum likelihood estimates for parameters of multivariate normally distributed data with missing values. The following statements invoke the MI procedure and request the EM algorithm to compute the MLE for $(\bmu , \bSigma )$ of a multivariate normal distribution from the input data set Fitness1:

proc mi data=Fitness1 seed=1518971 simple nimpute=0;
   em itprint outem=outem;
   var Oxygen RunTime RunPulse;
run;

Note that when you specify the NIMPUTE=0 option, the missing values are not imputed.

The "Model Information" table in Output 75.1.1 describes the method and options used in the procedure if a positive number is specified in the NIMPUTE= option.

Output 75.1.1: Model Information

The MI Procedure

Model Information
Data Set WORK.FITNESS1
Method MCMC
Multiple Imputation Chain Single Chain
Initial Estimates for MCMC EM Posterior Mode
Start Starting Value
Prior Jeffreys
Number of Imputations 0
Number of Burn-in Iterations 200
Number of Iterations 100
Seed for random number generator 1518971



The "Missing Data Patterns" table in Output 75.1.2 lists distinct missing data patterns with corresponding frequencies and percentages. Here, a value of "X" means that the variable is observed in the corresponding group and a value of "." means that the variable is missing. The table also displays group-specific variable means.

Output 75.1.2: Missing Data Patterns

Missing Data Patterns
Group Oxygen RunTime RunPulse Freq Percent Group Means
Oxygen RunTime RunPulse
1 X X X 21 67.74 46.353810 10.809524 171.666667
2 X X . 4 12.90 47.109500 10.137500 .
3 X . . 3 9.68 52.461667 . .
4 . X X 1 3.23 . 11.950000 176.000000
5 . X . 2 6.45 . 9.885000 .



With the SIMPLE option, the procedure displays simple descriptive univariate statistics for available cases in the "Univariate Statistics" table in Output 75.1.3 and correlations from pairwise available cases in the "Pairwise Correlations" table in Output 75.1.4.

Output 75.1.3: Univariate Statistics

Univariate Statistics
Variable N Mean Std Dev Minimum Maximum Missing Values
Count Percent
Oxygen 28 47.11618 5.41305 37.38800 60.05500 3 9.68
RunTime 28 10.68821 1.37988 8.63000 14.03000 3 9.68
RunPulse 22 171.86364 10.14324 148.00000 186.00000 9 29.03



Output 75.1.4: Pairwise Correlations

Pairwise Correlations
  Oxygen RunTime RunPulse
Oxygen 1.000000000 -0.849118562 -0.343961742
RunTime -0.849118562 1.000000000 0.247258191
RunPulse -0.343961742 0.247258191 1.000000000



When you use the EM statement, the MI procedure displays the initial parameter estimates for the EM algorithm in the "Initial Parameter Estimates for EM" table in Output 75.1.5.

Output 75.1.5: Initial Parameter Estimates for EM

Initial Parameter Estimates for EM
_TYPE_ _NAME_ Oxygen RunTime RunPulse
MEAN   47.116179 10.688214 171.863636
COV Oxygen 29.301078 0 0
COV RunTime 0 1.904067 0
COV RunPulse 0 0 102.885281



When you use the ITPRINT option in the EM statement, the "EM (MLE) Iteration History" table in Output 75.1.6 displays the iteration history for the EM algorithm.

Output 75.1.6: EM (MLE) Iteration History

EM (MLE) Iteration History
_Iteration_ -2 Log L Oxygen RunTime RunPulse
0 289.544782 47.116179 10.688214 171.863636
1 263.549489 47.116179 10.688214 171.863636
2 255.851312 47.139089 10.603506 171.538203
3 254.616428 47.122353 10.571685 171.426790
4 254.494971 47.111080 10.560585 171.398296
5 254.483973 47.106523 10.556768 171.389208
6 254.482920 47.104899 10.555485 171.385257
7 254.482813 47.104348 10.555062 171.383345
8 254.482801 47.104165 10.554923 171.382424
9 254.482800 47.104105 10.554878 171.381992
10 254.482800 47.104086 10.554864 171.381796
11 254.482800 47.104079 10.554859 171.381708
12 254.482800 47.104077 10.554858 171.381669



The "EM (MLE) Parameter Estimates" table in Output 75.1.7 displays the maximum likelihood estimates for $\bmu $ and $\bSigma $ of a multivariate normal distribution from the data set Fitness1.

Output 75.1.7: EM (MLE) Parameter Estimates

EM (MLE) Parameter Estimates
_TYPE_ _NAME_ Oxygen RunTime RunPulse
MEAN   47.104077 10.554858 171.381669
COV Oxygen 27.797931 -6.457975 -18.031298
COV RunTime -6.457975 2.015514 3.516287
COV RunPulse -18.031298 3.516287 97.766857



You can also output the EM (MLE) parameter estimates to an output data set with the OUTEM= option. The following statements list the observations in the output data set Outem:

proc print data=outem;
   title 'EM Estimates';
run;

The output data set Outem in Output 75.1.8 is a TYPE=COV data set. The observation with _TYPE_=‘MEAN’ contains the MLE for the parameter $\bmu $, and the observations with _TYPE_=‘COV’ contain the MLE for the parameter $\bSigma $ of a multivariate normal distribution from the data set Fitness1.

Output 75.1.8: EM Estimates

EM Estimates

Obs _TYPE_ _NAME_ Oxygen RunTime RunPulse
1 MEAN   47.1041 10.5549 171.382
2 COV Oxygen 27.7979 -6.4580 -18.031
3 COV RunTime -6.4580 2.0155 3.516
4 COV RunPulse -18.0313 3.5163 97.767