The generalized logit model is commonly used to model a nominal, multinomial response – that is, a multilevel response whose levels have no inherent ordering. You can use PROC LOGISTIC to fit the generalized logit model by specifying the LINK=GLOGIT option in the MODEL statement. The following also applies when fitting the generalized logit model to repeated measures data using PROC GEE. This model is illustrated in the example titled "Nominal Response Data: Generalized Logits Model" in the Examples section of the LOGISTIC documentation. In that example, third graders from three different schools (SCHOOL) were exposed to three different styles (STYLE) of mathematics instruction: a self-paced computer-learning style, a team approach, and a traditional class approach. The students were asked which style they prefer and their responses were classified by the type of program (PROGRAM) they were in (a regular school day versus a regular day supplemented with an afternoon school program). The final model includes main effects for SCHOOL and PROGRAM. The data set is from the text by Stokes, Davis, and Koch. These statements fit the final model.
data school;
length Program $ 9;
input School Program $ Style $ Count @@;
datalines;
1 regular self 10 1 regular team 17 1 regular class 26
1 afternoon self 5 1 afternoon team 12 1 afternoon class 50
2 regular self 21 2 regular team 17 2 regular class 26
2 afternoon self 16 2 afternoon team 12 2 afternoon class 36
3 regular self 15 3 regular team 15 3 regular class 16
3 afternoon self 12 3 afternoon team 12 3 afternoon class 20
;
proc logistic data=school outest=ParmNames;
freq Count;
class School Program(ref=first);
model Style(order=data)=School Program / link=glogit;
run;
The "Type 3 Analysis of Effects" table provides a four degrees of freedom (DF) test for SCHOOL which tells you that there are differences among the schools. The "Analysis of Maximum Likelihood Estimates" table provides the parameter estimates for the fitted model.
4 |
14.8424 |
0.0050 |
2 |
10.9160 |
0.0043 |
1 |
-0.7978 |
0.1465 |
29.6502 |
<.0001 |
1 |
-0.6589 |
0.1367 |
23.2300 |
<.0001 |
1 |
-0.7992 |
0.2198 |
13.2241 |
0.0003 |
1 |
-0.2786 |
0.1867 |
2.2269 |
0.1356 |
1 |
0.2836 |
0.1899 |
2.2316 |
0.1352 |
1 |
-0.0985 |
0.1892 |
0.2708 |
0.6028 |
1 |
0.3737 |
0.1410 |
7.0272 |
0.0080 |
1 |
0.3713 |
0.1353 |
7.5332 |
0.0061 |
|
To further investigate the nature of the school differences, suppose you are interested in additional tests of these hypotheses:
- School 1 differs from School 2 on the SELF vs. CLASS logit
- School 1 differs from School 2 on the TEAM vs. CLASS logit
- There is an overall difference between Schools 1 and 2
Hypothesis 1 entails testing the difference between the SCHOOL 1 parameter on the SELF logit (-0.7992) and the SCHOOL 2 parameter on the SELF logit (0.2836). Similarly, hypothesis 2 compares the SCHOOL 1 and 2 parameters on the TEAM logit (-0.2786 vs -0.0985). Each of these tests has 1 DF. The third hypothesis is simply a joint test of hypotheses 1 and 2 and has 2 DF.
Typically, tests of linear combinations of model parameters can be conducted using the CONTRAST statement. In the CONTRAST statement, you refer to parameters by using the name of the corresponding effect (such as SCHOOL) in the model. This is not a problem as long as there is only one parameter associated with each level of a CLASS effect. But in a multinomial model there are multiple parameters for each level of a CLASS effect. For instance, in the above model there are two parameters for each level of SCHOOL because there are two logits being modeled. Had the response been binary rather than multinomial, you could use the CONTRAST statement to test the hypotheses because there would have only been one parameter per SCHOOL level.
However, for simple pairwise comparisons such as these three hypotheses, it is easiest to use the LSMEANS and LSMESTIMATE statements. While these hypotheses can also be tested using the TEST and ESTIMATE statements as shown below, the TEST statement requires you to know the names of the parameters that PROC LOGISTIC uses internally, and the ESTIMATE statement requires you to determine the appropriate contrast coefficients on the model parameters. The ODDSRATIO statement provides odds ratio estimates and can also test hypotheses within each logit when combined with the ORPVALUE option in the MODEL statement.
Using the LSMEANS, LSMESTIMATE, and ODDSRATIO statements
The following statements fit the model and test the three hypotheses of interest. Note that in order to use the LSMEANS and LSMESTIMATE statements, you must use GLM coding (parameterization) for the CLASS variables. The DIFF option in the LSMEANS statement provides comparisons of the schools within each of the two generalized logits. When odds ratio estimates are also desired, they are provided by adding the ODDSRATIO option. Confidence intervals for the odds ratios are included if you also specify the CL option. The LSMESTIMATE statement enables you to get an overall comparison of schools 1 and 2 by providing a contrast of the LS-means. Since differences in the LS-means are differences of logits, adding the EXP option in the LSMESTIMATE statement provides estimates of odds ratios for each pair of response levels. Again, confidence intervals are provided if you also specify the CL option. Particularly for more complex models, determining this contrast is vastly simpler than determining the correct contrast directly on the model parameters. The CATEGORY=JOINT option provides tests comparing the two schools in each logit (1 DF each) and the JOINT option provides a joint test comparing the two schools overall (2 DF). The ODDSRATIO statement provides odds ratio estimates for each pair of schools on each of the logits. By specifying the ORPVALUE option in the MODEL statement, a p-value is included with the odds ratios.
proc logistic data=school;
freq Count;
class School Program(ref=first) / param=glm;
model Style(order=data)=School Program / link=glogit orpvalue;
lsmeans school / diff oddsratio;
lsmestimate school '1 vs 2 overall' 1 -1 0 / category=joint joint exp;
oddsratio school;
run;
Following are the results from the LSMEANS statement. The p-value for hypothesis 1 is p=0.0022 indicating a significant difference between schools 1 and 2 in on the first (self) logit. However, the two schools are not significantly different on the second (team) logit (p=0.5702). The estimates of the odds ratios comparing the two schools appear in the Odds Ratio column. The estimate, 0.339, compares the odds of self vs. class. The estimate, 0.835, compares the odds of team vs. class.
self |
1 |
-1.5970 |
0.2849 |
-5.61 |
<.0001 |
self |
2 |
-0.5142 |
0.2107 |
-2.44 |
0.0147 |
self |
3 |
-0.2823 |
0.2581 |
-1.09 |
0.2739 |
team |
1 |
-0.9375 |
0.2210 |
-4.24 |
<.0001 |
team |
2 |
-0.7574 |
0.2279 |
-3.32 |
0.0009 |
team |
3 |
-0.2819 |
0.2580 |
-1.09 |
0.2746 |
self |
1 |
2 |
-1.0828 |
0.3539 |
-3.06 |
0.0022 |
0.339 |
self |
1 |
3 |
-1.3147 |
0.3839 |
-3.42 |
0.0006 |
0.269 |
self |
2 |
3 |
-0.2319 |
0.3327 |
-0.70 |
0.4858 |
0.793 |
team |
1 |
2 |
-0.1801 |
0.3172 |
-0.57 |
0.5702 |
0.835 |
team |
1 |
3 |
-0.6556 |
0.3395 |
-1.93 |
0.0535 |
0.519 |
team |
2 |
3 |
-0.4755 |
0.3436 |
-1.38 |
0.1665 |
0.622 |
|
Next are the results from the LSMESTIMATE statement. Note that there are six LS-means in the "SCHOOL Least Squares Means" table above – one for each school on each of the two logits. To test hypothesis 3, we want to compare the two school 1 LS-means to the two school 2 LS-means. The contrast coefficients in the LSMESTIMATE statement apply to the LS-means using the order in which they appear in the table. Note that the comparisons of the two schools in each logit match those from the DIFF and ODDSRATIO options in the LSMEANS statement. The overall comparison of the two schools with 2 DF indicates a significant difference (p=0.0089).
self |
-1.0828 |
0.3539 |
-3.06 |
0.0022 |
0.3386 |
team |
-0.1801 |
0.3172 |
-0.57 |
0.5702 |
0.8352 |
|
Finally, the ODDSRATIO statement and the ORPVALUE option provide the same tests of pairwise school differences in the separate logits. Again, the comparisons for schools 1 and 2 match those from the LSMEANS and LSMESTIMATE statements. Odds ratio estimates are also provided and match those obtained from the LSMEANS and LSMESTIMATE statements above.
0.339 |
0.169 |
0.678 |
0.0022 |
0.835 |
0.449 |
1.555 |
0.5702 |
0.269 |
0.127 |
0.570 |
0.0006 |
0.519 |
0.267 |
1.010 |
0.0535 |
0.793 |
0.413 |
1.522 |
0.4858 |
0.622 |
0.317 |
1.219 |
0.1665 |
|
Using the TEST, ESTIMATE, and CONTRAST statements
You can also test hypotheses like the first two above by using the TEST statement. Beginning in SAS 9.2 TS2M3, you can also use the ESTIMATE statement. In the TEST statement, hypotheses are specified using the unique names associated with the individual parameters in the model. The easiest way to determine the names of the parameters is to include the OUTEST= option when fitting the model as in the LOGISTIC statements above. This option saves the parameters in a data set. You can then print the data set to display the parameter values and their names. The TRANSPOSE step can be used optionally to arrange the parameters vertically rather than horizontally making the display more readable.
proc transpose data=ParmNames;
run;
proc print noobs;
run;
The _NAME_ column contains the parameter estimate names. With these names you can construct TEST statements to test the above hypotheses.
Intercept_self |
Intercept: Style=self |
-0.798 |
Intercept_team |
Intercept: Style=team |
-0.659 |
School1_self |
School 1: Style=self |
-0.799 |
School1_team |
School 1: Style=team |
-0.279 |
School2_self |
School 2: Style=self |
0.284 |
School2_team |
School 2: Style=team |
-0.098 |
Programregular_self |
Program regular: Style=self |
0.374 |
Programregular_team |
Program regular: Style=team |
0.371 |
_LNLIKE_ |
Model Log Likelihood |
-333.467 |
|
Note that unlike the CONTRAST statement, the TEST statement uses a label at the beginning of the statement to label the results rather than a quoted string embedded in the statement. This label must be constructed just as you would create a valid SAS variable name – it must be 32 characters or less, containing letters, numbers, or underscores, and begin with a character.
The ESTIMATE statement can be used to test hypotheses in specific logits. In the following code, the first two TEST statements are each followed by equivalent ESTIMATE statements for testing hypotheses 1 and 2. Note that the CATEGORY= option specifies to which logit the contrast refers. Each of the next two ESTIMATE statements reproduce the first two ESTIMATE statements. The CATEGORY=JOINT option in the first requests that the contrast be tested in each of the logits. The second ESTIMATE statement specifies two contrast matrix rows (in this case, the same contrast is specified in each row) and the CATEGORY="self","team" option indicates that the contrast in the first row is tested in the "self" logit and the contrast in the second row is tested in the "team" logit.
Hypothesis 3 is a 2 DF test since it tests one school difference in both of the logits. Adding the JOINT option to the previous ESTIMATE statement with CATEGORY=JOINT provides a joint test of both contrasts. The ONLY suboption displays only the joint test and suppresses the test in each logit. Since this hypothesis tests the same contrast in all logits, the CONTRAST statement can also perform the test, and that is shown next.
A test of no differences among the three schools on all logits is automatically provided in the "Type 3 Analysis of Effects" table produced by PROC LOGISTIC. This test can be replicated with the TEST, ESTIMATE, and CONTRAST statements. The last TEST statement provides a joint test that Schools 1 and 2 do not differ on either logit, and that Schools 1 and 3 also do not differ on either logit. This four DF test is an overall test of the SCHOOL effect and replicates the Type 3 test for SCHOOL[Note]. The same test is done in the ESTIMATE statement by specifying each contrast, matched in order with the logit name in the CATEGORY= option. The JOINT(ONLY) option again provides just the joint test of these four contrasts (4 DF). The CONTRAST statement provides a joint test of the specified two-row contrast matrix for all logits.
In each of these ESTIMATE statements that define a difference in two logits, the EXP option can be added to estimate the corresponding odds ratio. The CL option can also be added to obtain a confidence interval.
proc logistic data=school;
freq Count;
class School Program(ref=first);
model Style(order=data)=School Program / link=glogit;
/* Test hypothesis 1 */
School_1_vs_2_on_Self: test School1_self - School2_self;
estimate 'School 1 vs 2' school 1 -1 / category="self";
/* Test hypothesis 2 */
School_1_vs_2_on_Team: test School1_team - School2_team;
estimate 'School 1 vs 2' school 1 -1 / category="team";
/* Test both hypotheses 1 and 2 -- two ways */
estimate 'School 1 vs 2' school 1 -1 / category=joint;
estimate 'School 1 vs 2' school 1 -1,
'School 1 vs 2' school 1 -1 / category="self","team";
/* Test hypothesis 3 */
School_1_vs_2_overall: test School1_self - School2_self, School1_team - School2_team;
estimate 'School 1 vs 2 overall' school 1 -1 / category=joint joint(only);
contrast 'School 1 vs 2 overall' school 1 -1;
/* Test overall school effect -- no school differences on any logit */
School_overall: test School1_self - School2_self, School1_team - School2_team,
2*School1_self + School2_self, 2*School1_team + School2_team;
estimate 'School overall' school 1 -1, school 1 -1, school 2 1, school 2 1 /
category="self","team","self","team" joint(only);
contrast 'School overall' school 1 -1, school 2 1;
run;
The results of all four TEST statements above are shown in the "Linear Hypotheses Testing Results" below. The small p-value for the first test indicates that Schools 1 and 2 differ significantly on the Self vs Class logit (p=0.0022). The signs of the parameters (not shown) involved in the test indicate that School 1 has a lower odds of choosing the self-paced style while School 2 has a higher odds. The second test shows no significant difference between Schools 1 and 2 on the Team vs Class logit (p=0.5702). The overall test of differences between the two schools is significant (p=0.0089). Finally, the overall test of differences among all three schools is significant (p=0.0050) and matches the test in the table of "Type 3 Analysis of Effects" table.
4 |
14.8424 |
0.0050 |
2 |
10.9160 |
0.0043 |
9.3598 |
1 |
0.0022 |
0.3224 |
1 |
0.5702 |
9.4526 |
2 |
0.0089 |
14.8424 |
4 |
0.0050 |
2 |
9.4526 |
0.0089 |
4 |
14.8424 |
0.0050 |
self |
-1.0828 |
0.3539 |
-3.06 |
0.0022 |
team |
-0.1801 |
0.3172 |
-0.57 |
0.5702 |
self |
-1.0828 |
0.3539 |
-3.06 |
0.0022 |
team |
-0.1801 |
0.3172 |
-0.57 |
0.5702 |
self |
-1.0828 |
0.3539 |
-3.06 |
0.0022 |
team |
-0.1801 |
0.3172 |
-0.57 |
0.5702 |
|
_____
Note: Since the default effects coding (PARAM=EFFECT) is used in the CLASS statement, there are two design variables created for the three-level SCHOOL effect. The coding of these two design variables for each level of SCHOOL is shown in the "Class Level Information" table which shows that both design variables are coded as -1 for School 3. Therefore, the hypothesis School1-School3=0 when written in terms of the model parameters becomes School1-(-School1-School2)=0, or 2*School1+School2=0. This is the form of the hypothesis used in the fourth TEST statement.
Operating System and Release Information
SAS System | SAS/STAT | z/OS | | |
OpenVMS VAX | | |
Microsoft® Windows® for 64-Bit Itanium-based Systems | | |
Microsoft Windows Server 2003 Datacenter 64-bit Edition | | |
Microsoft Windows Server 2003 Enterprise 64-bit Edition | | |
Microsoft Windows XP 64-bit Edition | | |
Microsoft® Windows® for x64 | | |
OS/2 | | |
Microsoft Windows 95/98 | | |
Microsoft Windows 2000 Advanced Server | | |
Microsoft Windows 2000 Datacenter Server | | |
Microsoft Windows 2000 Server | | |
Microsoft Windows 2000 Professional | | |
Microsoft Windows NT Workstation | | |
Microsoft Windows Server 2003 Datacenter Edition | | |
Microsoft Windows Server 2003 Enterprise Edition | | |
Microsoft Windows Server 2003 Standard Edition | | |
Microsoft Windows Server 2003 for x64 | | |
Microsoft Windows Server 2008 | | |
Microsoft Windows Server 2008 for x64 | | |
Microsoft Windows XP Professional | | |
Windows 7 Enterprise 32 bit | | |
Windows 7 Enterprise x64 | | |
Windows 7 Home Premium 32 bit | | |
Windows 7 Home Premium x64 | | |
Windows 7 Professional 32 bit | | |
Windows 7 Professional x64 | | |
Windows 7 Ultimate 32 bit | | |
Windows 7 Ultimate x64 | | |
Windows Millennium Edition (Me) | | |
Windows Vista | | |
Windows Vista for x64 | | |
64-bit Enabled AIX | | |
64-bit Enabled HP-UX | | |
64-bit Enabled Solaris | | |
ABI+ for Intel Architecture | | |
AIX | | |
HP-UX | | |
HP-UX IPF | | |
IRIX | | |
Linux | | |
Linux for x64 | | |
Linux on Itanium | | |
OpenVMS Alpha | | |
OpenVMS on HP Integrity | | |
Solaris | | |
Solaris for x64 | | |
Tru64 UNIX | | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.