Example 8.1: Generalized Logit Model
Halloween trick-or-treaters are given a choice of candy in three bowls: one bowl
contains small chocolate candy bars, one contains lollipops, and the last
contains sugar candies. The children are classified by gender and by apparent
age (child and teenager), and the following candy preferences are observed:
|
|
|
Candy
|
|
Gender
|
Age
|
chocolate
|
lollipop
|
sugar
|
| boy | child | 2 | 13 | 3 |
| boy | teenager | 10 | 9 | 3 |
| girl | child | 3 | 9 | 1 |
| girl | teenager | 8 | 0 | 1 |
Interest centers on whether age or gender affects the choice of type of candy.
A generalized logit model can be fit to relate these three factors. Since the
data set may be too small for the asymptotic analysis to be valid, an exact
analysis is also performed. The following statements perform this analysis.
data halloween;
format Candy $9.;
input Gender $ Age $ Candy $ count @@;
datalines;
boy child chocolate 2 boy teenager chocolate 10
boy child lollipop 13 boy teenager lollipop 9
boy child sugar 3 boy teenager sugar 3
girl child chocolate 3 girl teenager chocolate 8
girl child lollipop 9 girl teenager lollipop 0
girl child sugar 1 girl teenager sugar 1
;
proc logistic data=halloween;
freq count;
class Gender(ref='girl') Age(ref='child') / param=ref;
model Candy(ref='chocolate') = Gender Age / link=glogit;
exact Gender Age / joint estimate=both;
run;
Reference levels for both Gender and Age are declared in the CLASS
statement, while the reference level for Candy is specified in the MODEL
statement. Since the response is nominal, a generalized logit model is fit by
specifying the LINK=GLOGIT. For the exact analysis, a joint test for the
parameters Gender and Age is requested, conditional on the
intercepts. Output 8.1.1 through Output 8.1.6 display the results of the
analyses.
Output 8.1.1:
|
| Model Information |
| Data Set |
WORK.HALLOWEEN |
| Response Variable |
Candy |
| Number of Response Levels |
3 |
| Number of Observations |
11 |
| Frequency Variable |
count |
| Sum of Frequencies |
62 |
| Model |
generalized logit |
| Optimization Technique |
Fisher's scoring |
| NOTE: |
1 observation having zero frequency or weight was excluded since it does not contribute to the analysis. |
|
|
Output 8.1.2:
| Response Profile |
Ordered Value |
Candy |
Total Frequency |
| 1 |
chocolate |
23 |
| 2 |
lollipop |
31 |
| 3 |
sugar |
8 |
| Logits modeled use Candy='chocolate' as the reference category. |
|
The "Response Profile" table (Output 8.1.2) indicates that `chocolate'
is the reference category for the Candy variable, so the logits being
modeled are
-
log([Pr( Candy = lollipop)/Pr( Candy = chocolate)]) and log([Pr( Candy = sugar)/Pr( Candy = chocolate)])
Output 8.1.3:
| Class Level Information |
| Class |
Value |
Design Variables |
| 1 |
| Gender |
boy |
1 |
| |
girl |
0 |
| Age |
child |
0 |
| |
teenager |
1 |
|
The "Class Level Information" table (Output 8.1.3) shows that `girl' and
`child' are the reference levels for Gender and Age, respectively.
Output 8.1.4: Asymptotic Results
| Model Convergence Status |
| Convergence criterion (GCONV=1E-8) satisfied. |
| Model Fit Statistics |
| Criterion |
Intercept Only |
Intercept and Covariates |
| AIC |
125.354 |
114.448 |
| SC |
129.608 |
127.210 |
| -2 Log L |
121.354 |
102.448 |
|
Output 8.1.4: (continued)
| Testing Global Null Hypothesis: BETA=0 |
| Test |
Chi-Square |
DF |
Pr > ChiSq |
| Likelihood Ratio |
18.9061 |
4 |
0.0008 |
| Score |
16.9631 |
4 |
0.0020 |
| Wald |
12.8115 |
4 |
0.0122 |
| Type III Analysis of Effects |
| Effect |
DF |
Wald Chi-Square |
Pr > ChiSq |
| Gender |
2 |
4.7168 |
0.0946 |
| Age |
2 |
12.2325 |
0.0022 |
|
All of the hypothesis tests in Output 8.1.4 show that the model fits, although
the Type III tests indicate that Gender has marginal influence.
Output 8.1.5: Asymptotic Results (continued)
| Analysis of Maximum Likelihood Estimates |
| Parameter |
|
Candy |
DF |
Estimate |
Standard Error |
Wald Chi-Square |
Pr > ChiSq |
| Intercept |
|
lollipop |
1 |
0.7698 |
0.5782 |
1.7722 |
0.1831 |
| Intercept |
|
sugar |
1 |
-0.9033 |
0.8664 |
1.0869 |
0.2972 |
| Gender |
boy |
lollipop |
1 |
1.5758 |
0.7569 |
4.3347 |
0.0373 |
| Gender |
boy |
sugar |
1 |
1.5261 |
1.0158 |
2.2570 |
0.1330 |
| Age |
teenager |
lollipop |
1 |
-2.6472 |
0.7572 |
12.2212 |
0.0005 |
| Age |
teenager |
sugar |
1 |
-1.7416 |
0.9623 |
3.2754 |
0.0703 |
| Odds Ratio Estimates |
| Effect |
Candy |
Point Estimate |
95% Wald Confidence Limits |
| Gender boy vs girl |
lollipop |
4.835 |
1.097 |
21.313 |
| Gender boy vs girl |
sugar |
4.600 |
0.628 |
33.686 |
| Age teenager vs child |
lollipop |
0.071 |
0.016 |
0.313 |
| Age teenager vs child |
sugar |
0.175 |
0.027 |
1.155 |
|
The parameter estimates and odds ratios for the asymptotic analysis are
displayed in Output 8.1.5, and show that the odds of choosing a lollipop over a
chocolate bar are five (4.835
5) times higher for boys versus girls,
and a child is 14 (1/0.071
14) times more likely than a teenager to
choose a lollipop over a chocolate bar.
Note in the "Analysis of Maximum Likelihood Estimates" table that the
dummy parameters for the class variables are labeled by their nonreference
level, and that the "Candy" column indicates the nonreference response
category for the logit.
Output 8.1.6: Exact Results
| Exact Conditional Analysis |
| Conditional Exact Tests |
| Effect |
Test |
Statistic |
p-Value |
| Exact |
Mid |
| Joint |
Score |
16.6895 |
0.0013 |
0.0013 |
| |
Probability |
7.115E-7 |
0.0009 |
0.0009 |
| Gender |
Score |
5.0830 |
0.0870 |
0.0835 |
| |
Probability |
0.00697 |
0.0988 |
0.0953 |
| Age |
Score |
14.5093 |
0.0003 |
0.0003 |
| |
Probability |
0.000032 |
0.0003 |
0.0003 |
| Exact Parameter Estimates |
| Parameter |
|
Candy |
Estimate |
|
95% Confidence Limits |
p-Value |
| Gender |
boy |
lollipop |
1.5017 |
|
-0.0692 |
3.4081 |
0.0641 |
| Gender |
boy |
sugar |
1.4114 |
|
-0.7079 |
4.0869 |
0.2715 |
| Age |
teenager |
lollipop |
-2.5231 |
|
-4.4303 |
-0.9979 |
0.0002 |
| Age |
teenager |
sugar |
-1.6244 |
|
-3.9734 |
0.5146 |
0.1673 |
| Exact Odds Ratios |
| Parameter |
|
Candy |
Estimate |
|
95% Confidence Limits |
p-Value |
| Gender |
boy |
lollipop |
4.489 |
|
0.933 |
30.209 |
0.0641 |
| Gender |
boy |
sugar |
4.102 |
|
0.493 |
59.553 |
0.2715 |
| Age |
teenager |
lollipop |
0.080 |
|
0.012 |
0.369 |
0.0002 |
| Age |
teenager |
sugar |
0.197 |
|
0.019 |
1.673 |
0.1673 |
|
The exact analysis (Output 8.1.6) produces results similar to the asymptotic
analysis. The exact score statistic for the joint test is very close to the
asymptotic global test, and the parameter estimates and odds ratios are quite
similar. However, the contrast between boys and girls is only marginally
significant.
Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.