49579 - Using PROC MIANALYZE to combine estimates from a multinomial logistic model

Usage Note 49579: Using PROC MIANALYZE to combine estimates from a multinomial logistic model

Ordinal Response - Proportional Odds Model

The data below represent an experiment in which researchers tested four cheese additives and obtained 52 response ratings for each additive. Each response was measured on a scale of nine categories ranging from strong dislike (1) to excellent taste (9). There were missing values in the response for six observations so PROC MI is used to impute those missing values. Because the response is ordinal, the monotone logistic imputation method is used.

      data Cheese;
        input Additive y @@;
        datalines;
      1 3 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 5 
      1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 6 1 6 
      1 6 1 6 1 6 1 6 1 6 1 6 1 7 1 7 1 7 
      1 . 1 7 1 7 1 7 1 7 1 7 1 7 1 7 1 7 
      1 7 1 7 1 7 1 7 1 7 1 7 1 7 1 8 1 8 
      1 8 1 8 1 8 1 8 1 8 1 8 1 9 2 1 2 1 
      2 1 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 
      2 2 2 2 2 2 2 2 2 3 2 3 2 3 2 3 2 3 
      2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 4 2 4
      2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 
      2 5 2 5 2 5 2 5 2 5 2 5 2 5 2 6 2 6 
      2 . 2 6 2 6 2 6 2 7 3 1 3 2 3 3 3 3 
      3 3 3 3 3 3 3 3 3 4 3 4 3 4 3 4 3 4 
      3 4 3 4 3 4 3 5 3 5 3 5 3 5 3 5 3 5 
      3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 
      3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 . 
      3 6 3 6 3 6 3 6 3 6 3 6 3 7 3 7 3 7 
      3 7 3 7 3 8 4 4 4 5 4 5 4 5 4 6 4 6 
      4 6 4 6 4 6 4 6 4 6 4 7 4 7 4 7 4 7 
      4 7 4 7 4 7 4 7 4 7 4 7 4 7 4 7 4 7 
      4 7 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 . 
      4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 . 
      4 9 4 9 4 9 4 9 4 9 4 9 4 9 4 9 4 . 
      4 9 
      ;
      
      proc mi data=Cheese out=cheese_mi seed=1;
        class additive y;
        var additive y;
        monotone logistic (y=additive);
        run;

The following statements fit a proportional odds model for each imputed data set. The OUTEST= and COVOUT options save the parameter estimates and the estimated covariance matrix of the estimates to a data set.

      proc logistic data=cheese_mi outest=ordinal_parms covout;
        by _imputation_;
        class additive;
        model y=additive;
        run;

It is necessary to understand the naming convention that PROC LOGISTIC uses in naming the variables containing the parameter estimates in the OUTEST= data set so they can be specified in the MODELEFFECTS statement in PROC MIANALYZE. The intercept variables are named Intercept_xxx, where xxx is the value (formatted if a format is applied) of the corresponding response category.

For continuous explanatory variables, the variable names containing the parameters are the same as the corresponding model variables. For CLASS variables, the variable names are obtained by concatenating the corresponding CLASS variable name with the CLASS category. For interaction and nested effects, the parameter names are created by concatenating the names of each effect. See "Input and Output Data Sets: Parameter Names in the OUTEST= Data Set" in the Details section of the PROC LOGISTIC documentation for more details.

The names of the variables containing the parameter estimates are easily seen using PROC PRINT or PROC CONTENTS. The following statements display the parameter estimates from the first imputed data set. Note the names of variables containing the parameter estimates.

      proc print data=ordinal_parms noobs;
        where _imputation_=1 and _TYPE_="PARMS";
        var int: add:;
        title 'Parameter Estimates for the First Imputation';
        run;

Parameter Estimates for the First Imputation

Intercept_1	Intercept_2	Intercept_3	Intercept_4	Intercept_5	Intercept_6	Intercept_7	Intercept_8	Additive1	Additive2	Additive3
-4.61011	-3.55410	-2.45286	-1.38290	0.043577	0.97785	2.55390	4.15586	-0.91614	2.49980	0.83682

PROC MIANALYZE can now be used to combine the results from the imputed data sets. The parameter variables are individually listed in the MODELEFFECTS statement. Note that variable lists such as intercept_1-intercept_8 or int: cannot be used. The EDF= option is also specified because the calculated degrees of freedom far exceed the complete data degrees of freedom. In this case it is set to 197 which is the number of observations (208) minus the number of parameters (11).

      proc mianalyze data=ordinal_parms edf=205;
        modeleffects intercept_1 intercept_2 intercept_3 intercept_4 
                     intercept_5 intercept_6 intercept_7 intercept_8 
                     additive1 additive2 additive3;
        run;

Parameter Estimates
Parameter	Estimate	Std Error	95% Confidence Limits		DF	Minimum	Maximum	Theta0	t for H0: Parameter=Theta0	Pr > \|t\|
intercept_1	-4.619201	0.429829	-5.46674	-3.77166	201.31	-4.642194	-4.567857	0	-10.75	<.0001
intercept_2	-3.546659	0.311251	-4.16049	-2.93282	195.81	-3.584784	-3.500469	0	-11.39	<.0001
intercept_3	-2.447092	0.244062	-2.92845	-1.96573	193.59	-2.480908	-2.412149	0	-10.03	<.0001
intercept_4	-1.349380	0.202164	-1.74808	-0.95068	195.75	-1.382898	-1.319215	0	-6.67	<.0001
intercept_5	0.057492	0.183550	-0.30465	0.41963	183.69	0.010079	0.094214	0	0.31	0.7545
intercept_6	0.972829	0.191086	0.59596	1.34970	194.53	0.938073	1.007891	0	5.09	<.0001
intercept_7	2.467957	0.246718	1.98076	2.95516	161.96	2.419185	2.553901	0	10.00	<.0001
intercept_8	4.107772	0.365795	3.38640	4.82914	197.4	4.056246	4.155857	0	11.23	<.0001
additive1	-0.912614	0.231845	-1.36985	-0.45538	195.66	-0.931951	-0.861789	0	-3.94	0.0001
additive2	2.515787	0.275572	1.97227	3.05931	193.01	2.454894	2.542470	0	9.13	<.0001
additive3	0.812844	0.228925	0.36143	1.26426	200.42	0.789217	0.836817	0	3.55	0.0005

Nominal Response - Generalized Logit Model

For the purpose of illustration, missing values are introduced into the variable Program in the school instruction style data set that demonstrates how to use PROC LOGISTIC to fit a generalized logit model.

      data school;
        length Program $ 9;
        input School Program $ Style $ NumStudent Count @@; 
        datalines; 
      1 regular   self 21 10  1 regular   team 22 17  1 regular   class 16 26
      1 afternoon self 23  5  1 afternoon team 26 12  1 afternoon class 21 50 
      2 .         self 22 21  2 regular   team 31 17  2 regular   class 32 26
      2 .         .    18 16  2 afternoon team 28 12  2 afternoon class 27 36 
      3 regular   self 14 15  3 regular   team 32 15  3 regular   class 31 16
      3 afternoon self 19 12  3 afternoon team 30 12  3 .         class 33 20 
      ;

PROC MI is used to impute the missing values. The DISCRIM method is used for Style since it is nominal with three levels and the LOGISTIC method is used for Program since it has two levels.

      proc mi data=school out=school_imp;
        freq count;
        class school style program;
        var NumStudent school style program;
        monotone discrim (style=NumStudent);
        monotone logistic (program=school style);
        title 'Proc MI results for monotone Logistic model';
        run;

These statements fit a generalized logit model to each of the imputed data sets using PROC LOGISTIC. The OUTEST= and COVOUT options create a data set containing the parameter estimates and the estimated covariance matrix of the estimates.

      proc logistic data=school_imp outest=imp_parms covout;
        by _imputation_;
        freq Count; 
        class School Program(ref=first);
        model Style(order=data)=School Program NumStudent / link=glogit;
        run;

The parameter variables needed for the MODELEFFECTS statement in PROC MIANALYZE are named as described above for the ordinal model. However, for the generalized logit model, names of parameters corresponding to each nonreference category contain _xxx as the suffix, where xxx is the value (formatted if a format is applied) of the corresponding nonreference category. See "Input and Output Data Sets: Parameter Names in the OUTEST= Data Set" in the Details section of the PROC LOGISTIC documentation for more details and an example. As before, the names can be displayed using PROC PRINT.

      proc print data=imp_parms noobs;
        where _imputation_=1 and _TYPE_="PARMS";
        var int: sch: pro: num:;
        title 'Parameter Estimates for the First Imputation';
        run;

Parameter Estimates for the First Imputation

Intercept_self	Intercept_team	School1_self	School1_team	School2_self	School2_team	Programregular_self	Programregular_team	NumStudent_self	NumStudent_team
4.82870	-10.7580	-1.30866	1.81873	-0.036896	-0.84613	-0.27715	-0.21415	-0.26243	0.36837

The following statements use PROC MIANALYZE to combine the results from the imputed data sets. As discussed in the previous example, the individual parameters are specified in the MODELEFFECTS statement and the EDF= option is also specified and set to 328 which is the number of observations (338) minus the number of parameters (10).

      proc mianalyze data=imp_parms edf=328;
        modeleffects Intercept_self      Intercept_team 
                     School1_self        School1_team 
                     School2_self        School2_team 
                     Programregular_self Programregular_team 
                     NumStudent_self     NumStudent_team ;
        run;

Parameter Estimates
Parameter	Estimate	Std Error	95% Confidence Limits		DF	Minimum	Maximum	Theta0	t for H0: Parameter=Theta0	Pr > \|t\|
Intercept_self	7.915859	2.261504	2.6055	13.22620	7.2508	4.828695	9.086727	0	3.50	0.0094
Intercept_team	-7.613400	5.067041	-20.8581	5.63126	4.7383	-10.757964	-0.090866	0	-1.50	0.1964
School1_self	-2.202207	0.739524	-3.9822	-0.42222	6.4403	-2.654357	-1.308659	0	-2.98	0.0227
School1_team	1.201555	1.126320	-1.7116	4.11474	4.8999	-0.485101	1.818729	0	1.07	0.3358
School2_self	0.749434	0.725993	-1.0815	2.58032	5.3417	-0.036896	1.232299	0	1.03	0.3464
School2_team	-0.545767	0.559290	-1.9547	0.86315	5.363	-0.846132	0.276731	0	-0.98	0.3711
Programregular_self	0.024919	0.771680	-2.0730	2.12288	4.2267	-1.017447	0.569102	0	0.03	0.9757
Programregular_team	0.108956	0.338419	-0.7236	0.94147	5.8709	-0.214147	0.336845	0	0.32	0.7586
NumStudent_self	-0.388506	0.094675	-0.6069	-0.17008	7.9787	-0.432170	-0.262428	0	-4.10	0.0034
NumStudent_team	0.253699	0.183384	-0.2260	0.73342	4.7266	-0.018478	0.368370	0	1.38	0.2283

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		Z64
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8 Pro
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 for x64
		Microsoft Windows Server 2012
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> Categorical Data Analysis Analytics ==> Missing Value Imputation SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> MI SAS Reference ==> Procedures ==> MIANALYZE

Date Modified:	2013-07-31 16:18:57
Date Created:	2013-04-03 14:40:01

Support

Usage Note 49579: Using PROC MIANALYZE to combine estimates from a multinomial logistic model

Ordinal Response - Proportional Odds Model

Nominal Response - Generalized Logit Model

Operating System and Release Information