The MIANALYZE Procedure

Getting Started: MIANALYZE Procedure

The Fitness data described in the REG procedure are measurements of 31 individuals in a physical fitness course. See Chapter 97: The REG Procedure, for more information. The Fitness1 data set is constructed from the Fitness data set and contains three variables: Oxygen, RunTime, and RunPulse. Some values have been set to missing, and the resulting data set has an arbitrary pattern of missingness in these three variables.

*----------------- Data on Physical Fitness -----------------*
| These measurements were made on men involved in a physical |
| fitness course at N.C. State University.                   |
| Only selected variables of                                 |
| Oxygen (oxygen intake, ml per kg body weight per minute),  |
| Runtime (time to run 1.5 miles in minutes), and            |
| RunPulse (heart rate while running) are used.              |
| Certain values were changed to missing for the analysis.   |
*------------------------------------------------------------*;
data Fitness1;
   input Oxygen RunTime RunPulse @@;
   datalines;
44.609  11.37  178     45.313  10.07  185
54.297   8.65  156     59.571    .      .
49.874   9.22    .     44.811  11.63  176
  .     11.95  176          .  10.85    .
39.442  13.08  174     60.055   8.63  170
50.541    .      .     37.388  14.03  186
44.754  11.12  176     47.273    .      .
51.855  10.33  166     49.156   8.95  180
40.836  10.95  168     46.672  10.00    .
46.774  10.25    .     50.388  10.08  168
39.407  12.63  174     46.080  11.17  156
45.441   9.63  164       .      8.92    .
45.118  11.08    .     39.203  12.88  168
45.790  10.47  186     50.545   9.93  148
48.673   9.40  186     47.920  11.50  170
47.467  10.50  170
;

Suppose that the data are multivariate normally distributed and that the missing data are missing at random (see the section Statistical Assumptions for Multiple Imputation in Chapter 75: The MI Procedure, for more information about these assumptions). The following statements use the MI procedure to impute missing values for the Fitness1 data set:

proc mi data=Fitness1 seed=3237851 noprint out=outmi;
   var Oxygen RunTime RunPulse;
run;

The MI procedure creates imputed data sets, which are stored in the Outmi data set. A variable named _Imputation_ indicates the imputation numbers. Based on m imputations, m different sets of the point and variance estimates for a parameter can be computed. In PROC MI, m = 25 is the default.

The following statements generate regression coefficients for each of the 25 imputed data sets:

proc reg data=outmi outest=outreg covout noprint;
   model Oxygen= RunTime RunPulse;
   by _Imputation_;
run;

The following statements display (in Figure 76.1) output parameter estimates and covariance matrices from PROC REG for the first two imputed data sets:

proc print data=outreg(obs=8);
   var _Imputation_ _Type_ _Name_
      Intercept RunTime RunPulse;
   title 'Parameter Estimates from Imputed Data Sets';
run;

Figure 76.1: Parameter Estimates

Parameter Estimates from Imputed Data Sets

Obs	_Imputation_	_TYPE_	_NAME_	Intercept	RunTime	RunPulse
1	1	PARMS		86.544	-2.82231	-0.05873
2	1	COV	Intercept	100.145	-0.53519	-0.55077
3	1	COV	RunTime	-0.535	0.10774	-0.00345
4	1	COV	RunPulse	-0.551	-0.00345	0.00343
5	2	PARMS		83.021	-3.00023	-0.02491
6	2	COV	Intercept	79.032	-0.66765	-0.41918
7	2	COV	RunTime	-0.668	0.11456	-0.00313
8	2	COV	RunPulse	-0.419	-0.00313	0.00264

The following statements combine the 25 sets of regression coefficients:

proc mianalyze data=outreg;
   modeleffects Intercept RunTime RunPulse;
run;

The "Model Information" table in Figure 76.2 lists the input data set(s) and the number of imputations.

Figure 76.2: Model Information Table

The MIANALYZE Procedure

Model Information
Data Set	WORK.OUTREG
Number of Imputations	25

The "Variance Information" table in Figure 76.3 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences. It also displays the degrees of freedom for the total variance, the relative increase in variance due to missing values, the fraction of missing information, and the relative efficiency for each parameter estimate.

Figure 76.3: Variance Information Table

Variance Information (25 Imputations)
Parameter	Variance			DF	Relative Increase in Variance	Fraction Missing Information	Relative Efficiency
Parameter	Between	Within	Total	DF	Relative Increase in Variance	Fraction Missing Information	Relative Efficiency
Intercept	22.485821	75.413875	98.799129	428.38	0.310092	0.240234	0.990482
RunTime	0.021126	0.124930	0.146902	1072.9	0.175870	0.151147	0.993990
RunPulse	0.000656	0.002622	0.003304	562.35	0.260376	0.209393	0.991694

The "Parameter Estimates" table in Figure 76.4 displays a combined estimate and standard error for each regression coefficient (parameter). Inferences are based on t distributions. The table displays a 95% confidence interval and a t test with the associated p-value for the hypothesis that the parameter is equal to the value specified with the THETA0= option (in this case, zero by default). The minimum and maximum parameter estimates from the imputed data sets are also displayed.

Figure 76.4: Parameter Estimates

Parameter Estimates (25 Imputations)
Parameter	Estimate	Std Error	95% Confidence Limits		DF	Minimum	Maximum	Theta0	t for H0: Parameter=Theta0	Pr > \|t\|
Intercept	92.700420	9.939775	73.16362	112.2372	428.38	83.020730	100.839807	0	9.33	<.0001
RunTime	-3.030325	0.383278	-3.78238	-2.2783	1072.9	-3.280042	-2.754668	0	-7.91	<.0001
RunPulse	-0.079621	0.057482	-0.19253	0.0333	562.35	-0.135862	-0.024910	0	-1.39	0.1666