41516 - Testing hypotheses and estimating odds ratios within and across specific logits in LINK=GLOGIT models

SUPPORT / SAMPLES & SAS NOTES

Support

Usage Note 41516: Testing hypotheses and estimating odds ratios within and across specific logits in LINK=GLOGIT models

The generalized logit model is commonly used to model a nominal, multinomial response – that is, a multilevel response whose levels have no inherent ordering. You can use PROC LOGISTIC to fit the generalized logit model by specifying the LINK=GLOGIT option in the MODEL statement. The following also applies when fitting the generalized logit model to repeated measures data using PROC GEE. This model is illustrated in the example titled "Nominal Response Data: Generalized Logits Model" in the Examples section of the LOGISTIC documentation. In that example, third graders from three different schools (SCHOOL) were exposed to three different styles (STYLE) of mathematics instruction: a self-paced computer-learning style, a team approach, and a traditional class approach. The students were asked which style they prefer and their responses were classified by the type of program (PROGRAM) they were in (a regular school day versus a regular day supplemented with an afternoon school program). The final model includes main effects for SCHOOL and PROGRAM. The data set is from the text by Stokes, Davis, and Koch. These statements fit the final model.

      data school;
         length Program $ 9;
         input School Program $ Style $ Count @@; 
         datalines; 
      1 regular   self 10  1 regular   team 17  1 regular   class 26
      1 afternoon self  5  1 afternoon team 12  1 afternoon class 50 
      2 regular   self 21  2 regular   team 17  2 regular   class 26
      2 afternoon self 16  2 afternoon team 12  2 afternoon class 36 
      3 regular   self 15  3 regular   team 15  3 regular   class 16
      3 afternoon self 12  3 afternoon team 12  3 afternoon class 20 
      ; 
      
      proc logistic data=school outest=ParmNames;
         freq Count; 
         class School Program(ref=first);
         model Style(order=data)=School Program / link=glogit;
      run;

The "Type 3 Analysis of Effects" table provides a four degrees of freedom (DF) test for SCHOOL which tells you that there are differences among the schools. The "Analysis of Maximum Likelihood Estimates" table provides the parameter estimates for the fitted model.

Type 3 Analysis of Effects
Effect	DF	Wald Chi-Square	Pr > ChiSq
School	4	14.8424	0.0050
Program	2	10.9160	0.0043

Analysis of Maximum Likelihood Estimates
Parameter		Style	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept		self	1	-0.7978	0.1465	29.6502	<.0001
Intercept		team	1	-0.6589	0.1367	23.2300	<.0001
School	1	self	1	-0.7992	0.2198	13.2241	0.0003
School	1	team	1	-0.2786	0.1867	2.2269	0.1356
School	2	self	1	0.2836	0.1899	2.2316	0.1352
School	2	team	1	-0.0985	0.1892	0.2708	0.6028
Program	regular	self	1	0.3737	0.1410	7.0272	0.0080
Program	regular	team	1	0.3713	0.1353	7.5332	0.0061

To further investigate the nature of the school differences, suppose you are interested in additional tests of these hypotheses:

School 1 differs from School 2 on the SELF vs. CLASS logit
School 1 differs from School 2 on the TEAM vs. CLASS logit
There is an overall difference between Schools 1 and 2

Hypothesis 1 entails testing the difference between the SCHOOL 1 parameter on the SELF logit (-0.7992) and the SCHOOL 2 parameter on the SELF logit (0.2836). Similarly, hypothesis 2 compares the SCHOOL 1 and 2 parameters on the TEAM logit (-0.2786 vs -0.0985). Each of these tests has 1 DF. The third hypothesis is simply a joint test of hypotheses 1 and 2 and has 2 DF.

Typically, tests of linear combinations of model parameters can be conducted using the CONTRAST statement. In the CONTRAST statement, you refer to parameters by using the name of the corresponding effect (such as SCHOOL) in the model. This is not a problem as long as there is only one parameter associated with each level of a CLASS effect. But in a multinomial model there are multiple parameters for each level of a CLASS effect. For instance, in the above model there are two parameters for each level of SCHOOL because there are two logits being modeled. Had the response been binary rather than multinomial, you could use the CONTRAST statement to test the hypotheses because there would have only been one parameter per SCHOOL level.

However, for simple pairwise comparisons such as these three hypotheses, it is easiest to use the LSMEANS and LSMESTIMATE statements. While these hypotheses can also be tested using the TEST and ESTIMATE statements as shown below, the TEST statement requires you to know the names of the parameters that PROC LOGISTIC uses internally, and the ESTIMATE statement requires you to determine the appropriate contrast coefficients on the model parameters. The ODDSRATIO statement provides odds ratio estimates and can also test hypotheses within each logit when combined with the ORPVALUE option in the MODEL statement.

Using the LSMEANS, LSMESTIMATE, and ODDSRATIO statements

The following statements fit the model and test the three hypotheses of interest. Note that in order to use the LSMEANS and LSMESTIMATE statements, you must use GLM coding (parameterization) for the CLASS variables. The DIFF option in the LSMEANS statement provides comparisons of the schools within each of the two generalized logits. When odds ratio estimates are also desired, they are provided by adding the ODDSRATIO option. Confidence intervals for the odds ratios are included if you also specify the CL option. The LSMESTIMATE statement enables you to get an overall comparison of schools 1 and 2 by providing a contrast of the LS-means. Since differences in the LS-means are differences of logits, adding the EXP option in the LSMESTIMATE statement provides estimates of odds ratios for each pair of response levels. Again, confidence intervals are provided if you also specify the CL option. Particularly for more complex models, determining this contrast is vastly simpler than determining the correct contrast directly on the model parameters. The CATEGORY=JOINT option provides tests comparing the two schools in each logit (1 DF each) and the JOINT option provides a joint test comparing the two schools overall (2 DF). The ODDSRATIO statement provides odds ratio estimates for each pair of schools on each of the logits. By specifying the ORPVALUE option in the MODEL statement, a p-value is included with the odds ratios.

      proc logistic data=school;
         freq Count;
         class School Program(ref=first) / param=glm;
         model Style(order=data)=School Program / link=glogit orpvalue;
         lsmeans school / diff oddsratio;
         lsmestimate school '1 vs 2 overall' 1 -1 0 / category=joint joint exp;
         oddsratio school;
         run;

Following are the results from the LSMEANS statement. The p-value for hypothesis 1 is p=0.0022 indicating a significant difference between schools 1 and 2 in on the first (self) logit. However, the two schools are not significantly different on the second (team) logit (p=0.5702). The estimates of the odds ratios comparing the two schools appear in the Odds Ratio column. The estimate, 0.339, compares the odds of self vs. class. The estimate, 0.835, compares the odds of team vs. class.

SCHOOL Least Squares Means
STYLE	SCHOOL	Estimate	Standard Error	z Value	Pr > \|z\|
self	1	-1.5970	0.2849	-5.61	<.0001
self	2	-0.5142	0.2107	-2.44	0.0147
self	3	-0.2823	0.2581	-1.09	0.2739
team	1	-0.9375	0.2210	-4.24	<.0001
team	2	-0.7574	0.2279	-3.32	0.0009
team	3	-0.2819	0.2580	-1.09	0.2746

Differences of SCHOOL Least Squares Means
STYLE	SCHOOL	_SCHOOL	Estimate	Standard Error	z Value	Pr > \|z\|	Odds Ratio
self	1	2	-1.0828	0.3539	-3.06	0.0022	0.339
self	1	3	-1.3147	0.3839	-3.42	0.0006	0.269
self	2	3	-0.2319	0.3327	-0.70	0.4858	0.793
team	1	2	-0.1801	0.3172	-0.57	0.5702	0.835
team	1	3	-0.6556	0.3395	-1.93	0.0535	0.519
team	2	3	-0.4755	0.3436	-1.38	0.1665	0.622

Next are the results from the LSMESTIMATE statement. Note that there are six LS-means in the "SCHOOL Least Squares Means" table above – one for each school on each of the two logits. To test hypothesis 3, we want to compare the two school 1 LS-means to the two school 2 LS-means. The contrast coefficients in the LSMESTIMATE statement apply to the LS-means using the order in which they appear in the table. Note that the comparisons of the two schools in each logit match those from the DIFF and ODDSRATIO options in the LSMEANS statement. The overall comparison of the two schools with 2 DF indicates a significant difference (p=0.0089).

Least Squares Means Estimates
Effect	Label	STYLE	Estimate	Standard Error	z Value	Pr > \|z\|	Exponentiated
SCHOOL	1 vs 2 overall	self	-1.0828	0.3539	-3.06	0.0022	0.3386
SCHOOL	1 vs 2 overall	team	-0.1801	0.3172	-0.57	0.5702	0.8352

Chi-Square Test for Least Squares Means Estimates
Effect	Label	Num DF	Chi-Square	Pr > ChiSq
SCHOOL	1 vs 2 overall	2	9.45	0.0089

Finally, the ODDSRATIO statement and the ORPVALUE option provide the same tests of pairwise school differences in the separate logits. Again, the comparisons for schools 1 and 2 match those from the LSMEANS and LSMESTIMATE statements. Odds ratio estimates are also provided and match those obtained from the LSMEANS and LSMESTIMATE statements above.

Odds Ratio Estimates and Wald Confidence Intervals
Odds Ratio	Estimate	95% Confidence Limits		p-Value
STYLE self: SCHOOL 1 vs 2	0.339	0.169	0.678	0.0022
STYLE team: SCHOOL 1 vs 2	0.835	0.449	1.555	0.5702
STYLE self: SCHOOL 1 vs 3	0.269	0.127	0.570	0.0006
STYLE team: SCHOOL 1 vs 3	0.519	0.267	1.010	0.0535
STYLE self: SCHOOL 2 vs 3	0.793	0.413	1.522	0.4858
STYLE team: SCHOOL 2 vs 3	0.622	0.317	1.219	0.1665

Using the TEST, ESTIMATE, and CONTRAST statements

You can also test hypotheses like the first two above by using the TEST statement. Beginning in SAS 9.2 TS2M3, you can also use the ESTIMATE statement. In the TEST statement, hypotheses are specified using the unique names associated with the individual parameters in the model. The easiest way to determine the names of the parameters is to include the OUTEST= option when fitting the model as in the LOGISTIC statements above. This option saves the parameters in a data set. You can then print the data set to display the parameter values and their names. The TRANSPOSE step can be used optionally to arrange the parameters vertically rather than horizontally making the display more readable.

      proc transpose data=ParmNames;
         run;
      proc print noobs; 
         run;

The _NAME_ column contains the parameter estimate names. With these names you can construct TEST statements to test the above hypotheses.

_NAME_	_LABEL_	Style
Intercept_self	Intercept: Style=self	-0.798
Intercept_team	Intercept: Style=team	-0.659
School1_self	School 1: Style=self	-0.799
School1_team	School 1: Style=team	-0.279
School2_self	School 2: Style=self	0.284
School2_team	School 2: Style=team	-0.098
Programregular_self	Program regular: Style=self	0.374
Programregular_team	Program regular: Style=team	0.371
_LNLIKE_	Model Log Likelihood	-333.467

Note that unlike the CONTRAST statement, the TEST statement uses a label at the beginning of the statement to label the results rather than a quoted string embedded in the statement. This label must be constructed just as you would create a valid SAS variable name – it must be 32 characters or less, containing letters, numbers, or underscores, and begin with a character.

The ESTIMATE statement can be used to test hypotheses in specific logits. In the following code, the first two TEST statements are each followed by equivalent ESTIMATE statements for testing hypotheses 1 and 2. Note that the CATEGORY= option specifies to which logit the contrast refers. Each of the next two ESTIMATE statements reproduce the first two ESTIMATE statements. The CATEGORY=JOINT option in the first requests that the contrast be tested in each of the logits. The second ESTIMATE statement specifies two contrast matrix rows (in this case, the same contrast is specified in each row) and the CATEGORY="self","team" option indicates that the contrast in the first row is tested in the "self" logit and the contrast in the second row is tested in the "team" logit.

Hypothesis 3 is a 2 DF test since it tests one school difference in both of the logits. Adding the JOINT option to the previous ESTIMATE statement with CATEGORY=JOINT provides a joint test of both contrasts. The ONLY suboption displays only the joint test and suppresses the test in each logit. Since this hypothesis tests the same contrast in all logits, the CONTRAST statement can also perform the test, and that is shown next.

A test of no differences among the three schools on all logits is automatically provided in the "Type 3 Analysis of Effects" table produced by PROC LOGISTIC. This test can be replicated with the TEST, ESTIMATE, and CONTRAST statements. The last TEST statement provides a joint test that Schools 1 and 2 do not differ on either logit, and that Schools 1 and 3 also do not differ on either logit. This four DF test is an overall test of the SCHOOL effect and replicates the Type 3 test for SCHOOL^[Note]. The same test is done in the ESTIMATE statement by specifying each contrast, matched in order with the logit name in the CATEGORY= option. The JOINT(ONLY) option again provides just the joint test of these four contrasts (4 DF). The CONTRAST statement provides a joint test of the specified two-row contrast matrix for all logits.

In each of these ESTIMATE statements that define a difference in two logits, the EXP option can be added to estimate the corresponding odds ratio. The CL option can also be added to obtain a confidence interval.

proc logistic data=school;
   freq Count;
   class School Program(ref=first);
   model Style(order=data)=School Program / link=glogit;

      /* Test hypothesis 1 */
   School_1_vs_2_on_Self: test School1_self - School2_self;
   estimate 'School 1 vs 2' school 1 -1 / category="self";

      /* Test hypothesis 2 */
   School_1_vs_2_on_Team: test School1_team - School2_team;
   estimate 'School 1 vs 2' school 1 -1 / category="team";

      /* Test both hypotheses 1 and 2 -- two ways */
   estimate 'School 1 vs 2' school 1 -1 / category=joint;
   estimate 'School 1 vs 2' school 1 -1, 
            'School 1 vs 2' school 1 -1 / category="self","team";

      /* Test hypothesis 3 */
   School_1_vs_2_overall: test School1_self - School2_self, School1_team - School2_team;
   estimate 'School 1 vs 2 overall' school 1 -1 / category=joint joint(only);
   contrast 'School 1 vs 2 overall' school 1 -1;

      /* Test overall school effect -- no school differences on any logit  */
   School_overall: test School1_self - School2_self, School1_team - School2_team,
                        2*School1_self + School2_self, 2*School1_team + School2_team;
   estimate 'School overall' school 1 -1, school 1 -1, school 2 1, school 2 1 / 
                             category="self","team","self","team" joint(only);
   contrast 'School overall' school 1 -1, school 2 1;
   run;

The results of all four TEST statements above are shown in the "Linear Hypotheses Testing Results" below. The small p-value for the first test indicates that Schools 1 and 2 differ significantly on the Self vs Class logit (p=0.0022). The signs of the parameters (not shown) involved in the test indicate that School 1 has a lower odds of choosing the self-paced style while School 2 has a higher odds. The second test shows no significant difference between Schools 1 and 2 on the Team vs Class logit (p=0.5702). The overall test of differences between the two schools is significant (p=0.0089). Finally, the overall test of differences among all three schools is significant (p=0.0050) and matches the test in the table of "Type 3 Analysis of Effects" table.

Type 3 Analysis of Effects
Effect	DF	Wald Chi-Square	Pr > ChiSq
SCHOOL	4	14.8424	0.0050
PROGRAM	2	10.9160	0.0043

Linear Hypotheses Testing Results
Label	Wald Chi-Square	DF	Pr > ChiSq
School_1_vs_2_on_Self	9.3598	1	0.0022
School_1_vs_2_on_Team	0.3224	1	0.5702
School_1_vs_2_overall	9.4526	2	0.0089
School_overall	14.8424	4	0.0050

Contrast Test Results
Contrast	DF	Wald Chi-Square	Pr > ChiSq
School 1 vs 2 overall	2	9.4526	0.0089
School overall	4	14.8424	0.0050

Estimate
Label	STYLE	Estimate	Standard Error	z Value	Pr > \|z\|
School 1 vs 2	self	-1.0828	0.3539	-3.06	0.0022

Estimate
Label	STYLE	Estimate	Standard Error	z Value	Pr > \|z\|
School 1 vs 2	team	-0.1801	0.3172	-0.57	0.5702

Estimates
Label	STYLE	Estimate	Standard Error	z Value	Pr > \|z\|
School 1 vs 2	self	-1.0828	0.3539	-3.06	0.0022
School 1 vs 2	team	-0.1801	0.3172	-0.57	0.5702

Estimates
Label	STYLE	Estimate	Standard Error	z Value	Pr > \|z\|
School 1 vs 2	self	-1.0828	0.3539	-3.06	0.0022
School 1 vs 2	team	-0.1801	0.3172	-0.57	0.5702

Chi-Square Test for Estimates
Label	Num DF	Chi-Square	Pr > ChiSq
School 1 vs 2 overall	2	9.45	0.0089

Chi-Square Test for Estimates
Label	Num DF	Chi-Square	Pr > ChiSq
School overall	4	14.84	0.0050

_____

Note: Since the default effects coding (PARAM=EFFECT) is used in the CLASS statement, there are two design variables created for the three-level SCHOOL effect. The coding of these two design variables for each level of SCHOOL is shown in the "Class Level Information" table which shows that both design variables are coded as -1 for School 3. Therefore, the hypothesis School1-School3=0 when written in terms of the model parameters becomes School1-(-School1-School2)=0, or 2*School1+School2=0. This is the form of the hypothesis used in the fourth TEST statement.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 for x64
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	SAS Reference ==> Procedures ==> LOGISTIC Analytics ==> Categorical Data Analysis

Date Modified:	2021-05-27 16:19:26
Date Created:	2010-11-05 14:42:40

Support

Usage Note 41516: Testing hypotheses and estimating odds ratios within and across specific logits in LINK=GLOGIT models

Operating System and Release Information

Follow Us

What is...