This example illustrates a log-linear model of independence, by using data that contain structural zero frequencies as well as sampling (random) zero frequencies.
In a population of six squirrel monkeys, the joint distribution of genital display with respect to active or passive role was observed. The data are from Fienberg (1980, Table 8-2). Since a monkey cannot have both the active and passive roles in the same interaction, the diagonal cells of the table are structural zeros. See Agresti (2002) for more information about the quasi-independence model.
The DATA step replaces the structural zeros with missing values, and the MISSING=STRUCTURAL option is specified in the MODEL statement to remove these zeros from the analysis. The ZERO=SAMPLING option treats the off-diagonal zeros as sampling zeros. Also, the row for Monkey 't' is deleted since it contains all zeros; therefore, the cell frequencies predicted by a model of independence are also zero. In addition, the CONTRAST statement compares the behavior of the two monkeys labeled 'u' and 'v'. See the section Structural and Sampling Zeros with Raw Data for information about how to perform this analysis when you have raw data. The following statements produce Output 29.5.1 through Output 29.5.8:
data Display; input Active $ Passive $ wt @@; if Active ne 't'; if Active eq Passive then wt=.; datalines; r r 0 r s 1 r t 5 r u 8 r v 9 r w 0 s r 29 s s 0 s t 14 s u 46 s v 4 s w 0 t r 0 t s 0 t t 0 t u 0 t v 0 t w 0 u r 2 u s 3 u t 1 u u 0 u v 38 u w 2 v r 0 v s 0 v t 0 v u 0 v v 0 v w 1 w r 9 w s 25 w t 4 w u 6 w v 13 w w 0 ;
title 'Behavior of Squirrel Monkeys'; proc catmod data=Display; weight wt; model Active*Passive=_response_ / missing=structural zero=sampling freq pred=freq noparm oneway; loglin Active Passive; contrast 'Passive, U vs. V' Passive 0 0 0 1 -1; contrast 'Active, U vs. V' Active 0 0 1 -1; title2 'Test Quasi-Independence for the Incomplete Table'; quit;
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Data Summary | |||
---|---|---|---|
Response | Active*Passive | Response Levels | 25 |
Weight Variable | wt | Populations | 1 |
Data Set | DISPLAY | Total Frequency | 220 |
Frequency Missing | 0 | Observations | 25 |
The results of the ONEWAY option are shown in Output 29.5.2. Monkey 't' does not show up as a value for the Active variable since that row was removed.
One-Way Frequencies | ||
---|---|---|
Variable | Value | Frequency |
Active | r | 23 |
s | 93 | |
u | 46 | |
v | 1 | |
w | 57 | |
Passive | r | 40 |
s | 29 | |
t | 24 | |
u | 60 | |
v | 64 | |
w | 3 |
Sampling zeros are displayed as 0 in Output 29.5.4. The Response Number column corresponds to the value displayed in the "Response Profiles" table in Output 29.5.3.
Population Profiles | |
---|---|
Sample | Sample Size |
1 | 220 |
Response Profiles | ||
---|---|---|
Response | Active | Passive |
1 | r | s |
2 | r | t |
3 | r | u |
4 | r | v |
5 | r | w |
6 | s | r |
7 | s | t |
8 | s | u |
9 | s | v |
10 | s | w |
11 | u | r |
12 | u | s |
13 | u | t |
14 | u | v |
15 | u | w |
16 | v | r |
17 | v | s |
18 | v | t |
19 | v | u |
20 | v | w |
21 | w | r |
22 | w | s |
23 | w | t |
24 | w | u |
25 | w | v |
Response Frequencies | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample | Response Number | ||||||||||||||||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | |
1 | 1 | 5 | 8 | 9 | 0 | 29 | 14 | 46 | 4 | 0 | 2 | 3 | 1 | 38 | 2 | 0 | 0 | 0 | 0 | 1 | 9 | 25 | 4 | 6 | 13 |
The analysis of variance table (Output 29.5.5) shows that the model of independence does not fit since the likelihood ratio test for the interaction is significant. In other words, active and passive behaviors of the squirrel monkeys are dependent behavior roles.
Maximum Likelihood Analysis of Variance | |||
---|---|---|---|
Source | DF | Chi-Square | Pr > ChiSq |
Active | 4 | 56.58 | <.0001 |
Passive | 5 | 47.94 | <.0001 |
Likelihood Ratio | 15 | 135.17 | <.0001 |
If the model fit these data, then the contrasts in Output 29.5.6 show that monkeys 'u' and 'v' appear to have similar passive behavior patterns but very different active behavior patterns.
Contrasts of Maximum Likelihood Estimates | |||
---|---|---|---|
Contrast | DF | Chi-Square | Pr > ChiSq |
Passive, U vs. V | 1 | 1.31 | 0.2524 |
Active, U vs. V | 1 | 14.87 | 0.0001 |
Output 29.5.7 displays the predicted response functions and Output 29.5.8 displays predicted cell frequencies (from the PRED=FREQ option), but since the model does not fit, these should be ignored. Note that, since the response function is the generalized logit with the 25th response as the baseline, the observed response functions for the sampling zeros are missing.
Maximum Likelihood Predicted Values for Response Functions | |||||
---|---|---|---|---|---|
Function Number |
Observed | Predicted | Residual | ||
Function | Standard Error |
Function | Standard Error |
||
1 | -2.56495 | 1.037749 | -0.97355 | 0.339019 | -1.5914 |
2 | -0.95551 | 0.526235 | -1.72504 | 0.345438 | 0.769529 |
3 | -0.48551 | 0.449359 | -0.52751 | 0.309254 | 0.042007 |
4 | -0.36772 | 0.433629 | -0.73927 | 0.249006 | 0.371543 |
5 | . | . | -3.56052 | 0.634104 | . |
6 | 0.802346 | 0.333775 | 0.320589 | 0.26629 | 0.481758 |
7 | 0.074108 | 0.385164 | -0.29934 | 0.295634 | 0.37345 |
8 | 1.263692 | 0.314105 | 0.898184 | 0.250857 | 0.365508 |
9 | -1.17865 | 0.571772 | 0.686431 | 0.173396 | -1.86509 |
10 | . | . | -2.13482 | 0.608071 | . |
11 | -1.8718 | 0.759555 | -0.2415 | 0.287218 | -1.63031 |
12 | -1.46634 | 0.640513 | -0.10994 | 0.303568 | -1.3564 |
13 | -2.56495 | 1.037749 | -0.86143 | 0.314794 | -1.70352 |
14 | 1.072637 | 0.321308 | 0.124346 | 0.204345 | 0.94829 |
15 | -1.8718 | 0.759555 | -2.6969 | 0.617433 | 0.8251 |
16 | . | . | -4.14787 | 1.024508 | . |
17 | . | . | -4.01632 | 1.030062 | . |
18 | . | . | -4.76781 | 1.032457 | . |
19 | . | . | -3.57028 | 1.020794 | . |
20 | -2.56495 | 1.037749 | -6.60328 | 1.161289 | 4.038332 |
21 | -0.36772 | 0.433629 | -0.36584 | 0.202959 | -0.00188 |
22 | 0.653926 | 0.34194 | -0.23429 | 0.232794 | 0.888212 |
23 | -1.17865 | 0.571772 | -0.98577 | 0.239408 | -0.19288 |
24 | -0.77319 | 0.493548 | 0.211754 | 0.185007 | -0.98494 |
Maximum Likelihood Predicted Values for Frequencies | ||||||
---|---|---|---|---|---|---|
Active | Passive | Observed | Predicted | Residual | ||
Frequency | Standard Error |
Frequency | Standard Error |
|||
r | s | 1 | 0.997725 | 5.259508 | 1.36156 | -4.25951 |
r | t | 5 | 2.210512 | 2.480726 | 0.691066 | 2.519274 |
r | u | 8 | 2.776525 | 8.215948 | 1.855146 | -0.21595 |
r | v | 9 | 2.937996 | 6.648049 | 1.50932 | 2.351951 |
r | w | 0 | 0 | 0.395769 | 0.240268 | -0.39577 |
s | r | 29 | 5.017696 | 19.18599 | 3.147915 | 9.814007 |
s | t | 14 | 3.620648 | 10.32172 | 2.169599 | 3.678284 |
s | u | 46 | 6.031734 | 34.18463 | 4.428706 | 11.81537 |
s | v | 4 | 1.981735 | 27.66096 | 3.722788 | -23.661 |
s | w | 0 | 0 | 1.6467 | 0.952712 | -1.6467 |
u | r | 2 | 1.407771 | 10.9364 | 2.12322 | -8.9364 |
u | s | 3 | 1.720201 | 12.47407 | 2.554336 | -9.47407 |
u | t | 1 | 0.997725 | 5.883583 | 1.380655 | -4.88358 |
u | v | 38 | 5.606814 | 15.7673 | 2.684692 | 22.2327 |
u | w | 2 | 1.407771 | 0.938652 | 0.551645 | 1.061348 |
v | r | 0 | 0 | 0.219966 | 0.221779 | -0.21997 |
v | s | 0 | 0 | 0.250893 | 0.253706 | -0.25089 |
v | t | 0 | 0 | 0.118338 | 0.120314 | -0.11834 |
v | u | 0 | 0 | 0.391924 | 0.393255 | -0.39192 |
v | w | 1 | 0.997725 | 0.018879 | 0.021728 | 0.981121 |
w | r | 9 | 2.937996 | 9.657645 | 1.808656 | -0.65765 |
w | s | 25 | 4.707344 | 11.01553 | 2.275019 | 13.98447 |
w | t | 4 | 1.981735 | 5.195638 | 1.184452 | -1.19564 |
w | u | 6 | 2.415857 | 17.2075 | 2.772098 | -11.2075 |
w | v | 13 | 3.497402 | 13.92369 | 2.24158 | -0.92369 |
The preceding PROC CATMOD step uses cell count data as input. Prior to invoking the CATMOD procedure, structural and sampling zeros are easily identified and manipulated in a single DATA step. For the situation where structural or sampling zeros (or both) exist and the input data set is raw data, use the following steps:
Run PROC FREQ on the raw data (see Chapter 36, The FREQ Procedure ). In the TABLES statement, list all dependent and independent variables, separated by asterisks, and use the SPARSE option and the OUT= option. This creates an output data set that contains all possible zero frequencies. Since the tabled output can be huge, you should also specify the NOPRINT option in the TABLES statement.
Use a DATA step to change the zero frequencies associated with either sampling zeros or structural zeros to missing.
Use the resulting data set as input to PROC CATMOD, specify the statement WEIGHT COUNT to use adjusted frequencies, and specify the ZERO= and MISSING= options to define your sampling and structural zeros.
For example, suppose the data set RawDisplay contains the raw data for the squirrel monkey data. The following statements show how to obtain the same analysis as shown previously:
proc freq data=RawDisplay; tables Active*Passive / sparse out=Combos noprint; run;
data Combos2; set Combos; if Active ne 't'; if Active eq Passive then count=.; run;
proc catmod data=Combos2; weight count; model Active*Passive=_response_ / zero=sampling missing=structural freq pred=freq noparm noresponse; loglin Active Passive; quit;
The first IF statement in the DATA step is needed only for this particular example; since observations for Monkey 't' were deleted from the Display data set, they also need to be deleted from Combos2.