This example uses the propensity score method to impute missing values for variables in a data set with a monotone missing pattern. The following statements invoke the MI procedure and request the propensity score method. The resulting data set is named outex2.
proc mi data=Fish1 seed=899603 out=outex2; monotone propensity; var Length1 Length2 Length3; run;
Note that the VAR statement is required and the data set must have a monotone missing pattern with variables as ordered in the VAR statement.
The "Model Information" table in Output 56.2.1 describes the method and options used in the multiple imputation process. By default, five imputations are created for the missing data.
Model Information | |
---|---|
Data Set | WORK.FISH1 |
Method | Monotone |
Number of Imputations | 5 |
Seed for random number generator | 899603 |
When monotone methods are used in the imputation, MONOTONE is displayed as the method. The "Monotone Model Specification" table in Output 56.2.2 displays the detailed model specification. By default, the observations are sorted into five groups based on their propensity scores.
Monotone Model Specification | |
---|---|
Method | Imputed Variables |
Propensity( Groups= 5) | Length2 Length3 |
Without covariates specified for imputed variables Length2 and Length3, the variable Length1 is used as the covariate for Length2, and the variables Length1 and Length2 are used as covariates for Length3.
The "Missing Data Patterns" table in Output 56.2.3 lists distinct missing data patterns with corresponding frequencies and percentages. Here, values of "X" and "." indicate that the variable is observed or missing, respectively, in the corresponding group. The table confirms a monotone missing pattern for these three variables.
Missing Data Patterns | ||||||||
---|---|---|---|---|---|---|---|---|
Group | Length1 | Length2 | Length3 | Freq | Percent | Group Means | ||
Length1 | Length2 | Length3 | ||||||
1 | X | X | X | 30 | 85.71 | 30.603333 | 33.436667 | 38.720000 |
2 | X | X | . | 3 | 8.57 | 29.033333 | 31.666667 | . |
3 | X | . | . | 2 | 5.71 | 27.750000 | . | . |
For the imputation process, first, missing values of Length2 in group 3 are imputed using observed values of Length1. Then the missing values of Length3 in group 2 are imputed using observed values of Length1 and Length2. And finally, the missing values of Length3 in group 3 are imputed using observed values of Length1 and imputed values of Length2.
After the completion of m imputations, the "Variance Information" table in Output 56.2.4 displays the between-imputation variance, within-imputation variance, and total variance for combining complete-data inferences. It also displays the degrees of freedom for the total variance. The relative increase in variance due to missingness, the fraction of missing information, and the relative efficiency for each variable are also displayed. A detailed description of these statistics is provided in the section Combining Inferences from Multiply Imputed Data Sets.
Variance Information | |||||||
---|---|---|---|---|---|---|---|
Variable | Variance | DF | Relative Increase in Variance |
Fraction Missing Information |
Relative Efficiency |
||
Between | Within | Total | |||||
Length2 | 0.001500 | 0.465422 | 0.467223 | 32.034 | 0.003869 | 0.003861 | 0.999228 |
Length3 | 0.049725 | 0.547434 | 0.607104 | 27.103 | 0.108999 | 0.102610 | 0.979891 |
The "Parameter Estimates" table in Output 56.2.5 displays the estimated mean and standard error of the mean for each variable. The inferences are based on the t distributions. For each variable, the table also displays a 95% mean confidence interval and a t statistic with the associated p-value for the hypothesis that the population mean is equal to the value specified in the MU0= option, which is zero by default.
Parameter Estimates | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Variable | Mean | Std Error | 95% Confidence Limits | DF | Minimum | Maximum | Mu0 | t for H0: Mean=Mu0 |
Pr > |t| | |
Length2 | 33.006857 | 0.683537 | 31.61460 | 34.39912 | 32.034 | 32.957143 | 33.060000 | 0 | 48.29 | <.0001 |
Length3 | 38.361714 | 0.779169 | 36.76328 | 39.96015 | 27.103 | 38.080000 | 38.545714 | 0 | 49.23 | <.0001 |
The following statements list the first 10 observations of the data set outex2, as shown in Output 56.2.6. The missing values are imputed from observed values with similar propensity scores.
proc print data=outex2(obs=10); title 'First 10 Observations of the Imputed Data Set'; run;
First 10 Observations of the Imputed Data Set |
Obs | _Imputation_ | Length1 | Length2 | Length3 |
---|---|---|---|---|
1 | 1 | 23.2 | 25.4 | 30.0 |
2 | 1 | 24.0 | 26.3 | 31.2 |
3 | 1 | 23.9 | 26.5 | 31.1 |
4 | 1 | 26.3 | 29.0 | 33.5 |
5 | 1 | 26.5 | 29.0 | 38.6 |
6 | 1 | 26.8 | 29.7 | 34.7 |
7 | 1 | 26.8 | 29.0 | 35.0 |
8 | 1 | 27.6 | 30.0 | 35.0 |
9 | 1 | 27.6 | 30.0 | 35.1 |
10 | 1 | 28.5 | 30.7 | 36.2 |