The CATMOD Procedure

Example 30.5 Log-Linear Model, Structural and Sampling Zeros

This example illustrates a log-linear model of independence, by using data that contain structural zero frequencies as well as sampling (random) zero frequencies.

In a population of six squirrel monkeys, the joint distribution of genital display with respect to active or passive role was observed. The data are from Fienberg (1980, Table 8-2). Since a monkey cannot have both the active and passive roles in the same interaction, the diagonal cells of the table are structural zeros. See Agresti (2002) for more information about the quasi-independence model.

The DATA step replaces the structural zeros with missing values, and the MISSING=STRUCTURAL option is specified in the MODEL statement to remove these zeros from the analysis. The ZERO=SAMPLING option treats the off-diagonal zeros as sampling zeros. Also, the row for Monkey 't' is deleted since it contains all zeros; therefore, the cell frequencies predicted by a model of independence are also zero. In addition, the CONTRAST statement compares the behavior of the two monkeys labeled 'u' and 'v'. See the section Structural and Sampling Zeros with Raw Data for information about how to perform this analysis when you have raw data. The following statements produce Output 30.5.1 through Output 30.5.8:

data Display;
   input Active $ Passive $ wt @@;
   if Active ne 't';
   if Active eq Passive then wt=.;
   datalines;
r r  0   r s  1   r t  5   r u  8   r v  9   r w  0
s r 29   s s  0   s t 14   s u 46   s v  4   s w  0
t r  0   t s  0   t t  0   t u  0   t v  0   t w  0
u r  2   u s  3   u t  1   u u  0   u v 38   u w  2
v r  0   v s  0   v t  0   v u  0   v v  0   v w  1
w r  9   w s 25   w t  4   w u  6   w v 13   w w  0
;
title 'Behavior of Squirrel Monkeys';
proc catmod data=Display;
   weight wt;
   model Active*Passive=_response_ /
         missing=structural zero=sampling
         freq pred=freq noparm oneway;
   loglin Active Passive;
   contrast 'Passive, U vs. V' Passive 0 0 0 1 -1;
   contrast 'Active,  U vs. V' Active  0 0 1 -1;
   title2 'Test Quasi-Independence for the Incomplete Table';
quit;

Output 30.5.1: Log-Linear Model Analysis with Zero Frequencies

Behavior of Squirrel Monkeys
Test Quasi-Independence for the Incomplete Table

The CATMOD Procedure

Data Summary
Response Active*Passive Response Levels 25
Weight Variable wt Populations 1
Data Set DISPLAY Total Frequency 220
Frequency Missing 0 Observations 25


The results of the ONEWAY option are shown in Output 30.5.2. Monkey 't' does not show up as a value for the Active variable since that row was removed.

Output 30.5.2: Output from the ONEWAY option

One-Way Frequencies
Variable Value Frequency
Active r 23
  s 93
  u 46
  v 1
  w 57
Passive r 40
  s 29
  t 24
  u 60
  v 64
  w 3


Sampling zeros are displayed as 0 in Output 30.5.4. The Response Number column corresponds to the value displayed in the Response Profiles table in Output 30.5.3.

Output 30.5.3: Profiles

Population Profiles
Sample Sample Size
1 220

Response Profiles
Response Active Passive
1 r s
2 r t
3 r u
4 r v
5 r w
6 s r
7 s t
8 s u
9 s v
10 s w
11 u r
12 u s
13 u t
14 u v
15 u w
16 v r
17 v s
18 v t
19 v u
20 v w
21 w r
22 w s
23 w t
24 w u
25 w v


Output 30.5.4: Frequency of Response by Response Number

Response Frequencies
Sample Response Number
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1 1 5 8 9 0 29 14 46 4 0 2 3 1 38 2 0 0 0 0 1 9 25 4 6 13


The analysis of variance table (Output 30.5.5) shows that the model of independence does not fit since the likelihood ratio test for the interaction is significant. In other words, active and passive behaviors of the squirrel monkeys are dependent behavior roles.

Output 30.5.5: Analysis of Variance Table

Maximum Likelihood Analysis of Variance
Source DF Chi-Square Pr > ChiSq
Active 4 56.58 <.0001
Passive 5 47.94 <.0001
Likelihood Ratio 15 135.17 <.0001


If the model fit these data, then the contrasts in Output 30.5.6 show that monkeys 'u' and 'v' appear to have similar passive behavior patterns but very different active behavior patterns.

Output 30.5.6: Contrasts between Monkeys 'u' and 'v'

Contrasts of Maximum Likelihood Estimates
Contrast DF Chi-Square Pr > ChiSq
Passive, U vs. V 1 1.31 0.2524
Active, U vs. V 1 14.87 0.0001


Output 30.5.7 displays the predicted response functions and Output 30.5.8 displays predicted cell frequencies (from the PRED=FREQ option), but since the model does not fit, these should be ignored. Note that, since the response function is the generalized logit with the 25th response as the baseline, the observed response functions for the sampling zeros are missing.

Output 30.5.7: Response Function Predicted Values

Maximum Likelihood Predicted Values for Response Functions
Function
Number
Observed Predicted Residual
Function Standard
Error
Function Standard
Error
1 -2.56495 1.037749 -0.97355 0.339019 -1.5914
2 -0.95551 0.526235 -1.72504 0.345438 0.769529
3 -0.48551 0.449359 -0.52751 0.309254 0.042007
4 -0.36772 0.433629 -0.73927 0.249006 0.371543
5 . . -3.56052 0.634104 .
6 0.802346 0.333775 0.320589 0.26629 0.481758
7 0.074108 0.385164 -0.29934 0.295634 0.37345
8 1.263692 0.314105 0.898184 0.250857 0.365508
9 -1.17865 0.571772 0.686431 0.173396 -1.86509
10 . . -2.13482 0.608071 .
11 -1.8718 0.759555 -0.2415 0.287218 -1.63031
12 -1.46634 0.640513 -0.10994 0.303568 -1.3564
13 -2.56495 1.037749 -0.86143 0.314794 -1.70352
14 1.072637 0.321308 0.124346 0.204345 0.94829
15 -1.8718 0.759555 -2.6969 0.617433 0.8251
16 . . -4.14787 1.024508 .
17 . . -4.01632 1.030062 .
18 . . -4.76781 1.032457 .
19 . . -3.57028 1.020794 .
20 -2.56495 1.037749 -6.60328 1.161289 4.038332
21 -0.36772 0.433629 -0.36584 0.202959 -0.00188
22 0.653926 0.34194 -0.23429 0.232794 0.888212
23 -1.17865 0.571772 -0.98577 0.239408 -0.19288
24 -0.77319 0.493548 0.211754 0.185007 -0.98494


Output 30.5.8: Predicted Frequencies

Maximum Likelihood Predicted Values for Frequencies
Active Passive Observed Predicted Residual
Frequency Standard
Error
Frequency Standard
Error
r s 1 0.997725 5.259508 1.36156 -4.25951
r t 5 2.210512 2.480726 0.691066 2.519274
r u 8 2.776525 8.215948 1.855146 -0.21595
r v 9 2.937996 6.648049 1.50932 2.351951
r w 0 0 0.395769 0.240268 -0.39577
s r 29 5.017696 19.18599 3.147915 9.814007
s t 14 3.620648 10.32172 2.169599 3.678284
s u 46 6.031734 34.18463 4.428706 11.81537
s v 4 1.981735 27.66096 3.722788 -23.661
s w 0 0 1.6467 0.952712 -1.6467
u r 2 1.407771 10.9364 2.12322 -8.9364
u s 3 1.720201 12.47407 2.554336 -9.47407
u t 1 0.997725 5.883583 1.380655 -4.88358
u v 38 5.606814 15.7673 2.684692 22.2327
u w 2 1.407771 0.938652 0.551645 1.061348
v r 0 0 0.219966 0.221779 -0.21997
v s 0 0 0.250893 0.253706 -0.25089
v t 0 0 0.118338 0.120314 -0.11834
v u 0 0 0.391924 0.393255 -0.39192
v w 1 0.997725 0.018879 0.021728 0.981121
w r 9 2.937996 9.657645 1.808656 -0.65765
w s 25 4.707344 11.01553 2.275019 13.98447
w t 4 1.981735 5.195638 1.184452 -1.19564
w u 6 2.415857 17.2075 2.772098 -11.2075
w v 13 3.497402 13.92369 2.24158 -0.92369


Structural and Sampling Zeros with Raw Data

The preceding PROC CATMOD step uses cell count data as input. Prior to invoking the CATMOD procedure, structural and sampling zeros are easily identified and manipulated in a single DATA step. For the situation where structural or sampling zeros (or both) exist and the input data set is raw data, use the following steps:

  1. Run PROC FREQ on the raw data (see Chapter 38: The FREQ Procedure,). In the TABLES statement, list all dependent and independent variables, separated by asterisks, and use the SPARSE option and the OUT= option. This creates an output data set that contains all possible zero frequencies. Since the tabled output can be huge, you should also specify the NOPRINT option in the TABLES statement.

  2. Use a DATA step to change the zero frequencies associated with either sampling zeros or structural zeros to missing.

  3. Use the resulting data set as input to PROC CATMOD, specify the statement WEIGHT COUNT to use adjusted frequencies, and specify the ZERO= and MISSING= options to define your sampling and structural zeros.

For example, suppose the data set RawDisplay contains the raw data for the squirrel monkey data. The following statements show how to obtain the same analysis as shown previously:

proc freq data=RawDisplay;
   tables Active*Passive / sparse out=Combos noprint;
run;
data Combos2;
   set Combos;
   if Active ne 't';
   if Active eq Passive then count=.; 
run;
proc catmod data=Combos2;
   weight count;
   model Active*Passive=_response_ /
         zero=sampling missing=structural
         freq pred=freq noparm noresponse;
   loglin Active Passive;
quit;

The first IF statement in the DATA step is needed only for this particular example; since observations for Monkey 't' were deleted from the Display data set, they also need to be deleted from Combos2.