This example uses the EM algorithm to compute the maximum likelihood estimates for parameters of multivariate normally distributed data with missing values. The following statements invoke the MI procedure and request the EM algorithm to compute the MLE for of a multivariate normal distribution from the input data set Fitness1:
proc mi data=Fitness1 seed=1518971 simple nimpute=0; em itprint outem=outem; var Oxygen RunTime RunPulse; run;
Note that when you specify the NIMPUTE=0 option, the missing values are not imputed.
The "Model Information" table in Output 56.1.1 describes the method and options used in the procedure if a positive number is specified in the NIMPUTE= option.
Model Information | |
---|---|
Data Set | WORK.FITNESS1 |
Method | MCMC |
Multiple Imputation Chain | Single Chain |
Initial Estimates for MCMC | EM Posterior Mode |
Start | Starting Value |
Prior | Jeffreys |
Number of Imputations | 0 |
Number of Burn-in Iterations | 200 |
Number of Iterations | 100 |
Seed for random number generator | 1518971 |
The "Missing Data Patterns" table in Output 56.1.2 lists distinct missing data patterns with corresponding frequencies and percentages. Here, a value of "X" means that the variable is observed in the corresponding group and a value of "." means that the variable is missing. The table also displays group-specific variable means.
Missing Data Patterns | ||||||||
---|---|---|---|---|---|---|---|---|
Group | Oxygen | RunTime | RunPulse | Freq | Percent | Group Means | ||
Oxygen | RunTime | RunPulse | ||||||
1 | X | X | X | 21 | 67.74 | 46.353810 | 10.809524 | 171.666667 |
2 | X | X | . | 4 | 12.90 | 47.109500 | 10.137500 | . |
3 | X | . | . | 3 | 9.68 | 52.461667 | . | . |
4 | . | X | X | 1 | 3.23 | . | 11.950000 | 176.000000 |
5 | . | X | . | 2 | 6.45 | . | 9.885000 | . |
With the SIMPLE option, the procedure displays simple descriptive univariate statistics for available cases in the "Univariate Statistics" table in Output 56.1.3 and correlations from pairwise available cases in the "Pairwise Correlations" table in Output 56.1.4.
Univariate Statistics | |||||||
---|---|---|---|---|---|---|---|
Variable | N | Mean | Std Dev | Minimum | Maximum | Missing Values | |
Count | Percent | ||||||
Oxygen | 28 | 47.11618 | 5.41305 | 37.38800 | 60.05500 | 3 | 9.68 |
RunTime | 28 | 10.68821 | 1.37988 | 8.63000 | 14.03000 | 3 | 9.68 |
RunPulse | 22 | 171.86364 | 10.14324 | 148.00000 | 186.00000 | 9 | 29.03 |
Pairwise Correlations | |||
---|---|---|---|
Oxygen | RunTime | RunPulse | |
Oxygen | 1.000000000 | -0.849118562 | -0.343961742 |
RunTime | -0.849118562 | 1.000000000 | 0.247258191 |
RunPulse | -0.343961742 | 0.247258191 | 1.000000000 |
When you use the EM statement, the MI procedure displays the initial parameter estimates for the EM algorithm in the "Initial Parameter Estimates for EM" table in Output 56.1.5.
Initial Parameter Estimates for EM | ||||
---|---|---|---|---|
_TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
MEAN | 47.116179 | 10.688214 | 171.863636 | |
COV | Oxygen | 29.301078 | 0 | 0 |
COV | RunTime | 0 | 1.904067 | 0 |
COV | RunPulse | 0 | 0 | 102.885281 |
When you use the ITPRINT option in the EM statement, the "EM (MLE) Iteration History" table in Output 56.1.6 displays the iteration history for the EM algorithm.
EM (MLE) Iteration History | ||||
---|---|---|---|---|
_Iteration_ | -2 Log L | Oxygen | RunTime | RunPulse |
0 | 289.544782 | 47.116179 | 10.688214 | 171.863636 |
1 | 263.549489 | 47.116179 | 10.688214 | 171.863636 |
2 | 255.851312 | 47.139089 | 10.603506 | 171.538203 |
3 | 254.616428 | 47.122353 | 10.571685 | 171.426790 |
4 | 254.494971 | 47.111080 | 10.560585 | 171.398296 |
5 | 254.483973 | 47.106523 | 10.556768 | 171.389208 |
6 | 254.482920 | 47.104899 | 10.555485 | 171.385257 |
7 | 254.482813 | 47.104348 | 10.555062 | 171.383345 |
8 | 254.482801 | 47.104165 | 10.554923 | 171.382424 |
9 | 254.482800 | 47.104105 | 10.554878 | 171.381992 |
10 | 254.482800 | 47.104086 | 10.554864 | 171.381796 |
11 | 254.482800 | 47.104079 | 10.554859 | 171.381708 |
12 | 254.482800 | 47.104077 | 10.554858 | 171.381669 |
The "EM (MLE) Parameter Estimates" table in Output 56.1.7 displays the maximum likelihood estimates for and of a multivariate normal distribution from the data set Fitness1.
EM (MLE) Parameter Estimates | ||||
---|---|---|---|---|
_TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
MEAN | 47.104077 | 10.554858 | 171.381669 | |
COV | Oxygen | 27.797931 | -6.457975 | -18.031298 |
COV | RunTime | -6.457975 | 2.015514 | 3.516287 |
COV | RunPulse | -18.031298 | 3.516287 | 97.766857 |
You can also output the EM (MLE) parameter estimates to an output data set with the OUTEM= option. The following statements list the observations in the output data set outem:
proc print data=outem; title 'EM Estimates'; run;
The output data set outem in Output 56.1.8 is a TYPE=COV data set. The observation with _TYPE_=‘MEAN’ contains the MLE for the parameter , and the observations with _TYPE_=‘COV’ contain the MLE for the parameter of a multivariate normal distribution from the data set Fitness1.
EM Estimates |
Obs | _TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
---|---|---|---|---|---|
1 | MEAN | 47.1041 | 10.5549 | 171.382 | |
2 | COV | Oxygen | 27.7979 | -6.4580 | -18.031 |
3 | COV | RunTime | -6.4580 | 2.0155 | 3.516 |
4 | COV | RunPulse | -18.0313 | 3.5163 | 97.767 |