Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The CATMOD Procedure

Example 1.1: Log-Linear Independence Model with Structural and Sampling Zeros

This example illustrates a log-linear model of independence, using data that contain structural zero frequencies as well as sampling (random) zero frequencies.

In a population of six squirrel monkeys, the joint distribution of genital display with respect to active or passive role was observed. The data are from Fienberg (1980, Table 8-2). The following DATA step creates the SAS data set Display:

   title 'Behavior of Squirrel Monkeys';
   data Display;
      input Active $ Passive $ wt @@;
      datalines;
   r r  .   r s  1   r t  5   r u  8   r v  9   r w  0
   s r 29   s s  .   s t 14   s u 46   s v  4   s w  0
   t r  0   t s  0   t t  .   t u  0   t v  0   t w  0
   u r  2   u s  3   u t  1   u u  .   u v 38   u w  2
   v r  0   v s  0   v t  0   v u  0   v v  .   v w  1
   w r  9   w s 25   w t  4   w u  6   w v 13   w w  . 
   ;

In this data set, since a monkey cannot have both active and passive roles in an interaction, the values on the diagonal are structural zeros. Any off-diagonal zeros are sampling zeros. Since there are two types of zeros in this data set, missing values are placed on the diagonal to represent the structural zeros.

Suppose you're interested in studying the independence of the active and passive roles. Since the diagonal cells are structural zeros, you are actually fitting a quasi-independence model; refer to Agresti (1990) for more information. Since monkey `t' never takes the active role, the frequencies predicted by an independence model for these cells are zero; these cells are removed from the analysis with the WHERE clause.

The following statements produce the analysis that treats the missing values on the diagonals as structural zeros (since the MISSING=STRUCTURAL option is the default for one population). The ZERO=SAMPLING option treats the remaining zeros as sampling zeros.

   proc catmod data=Display;
      weight wt;
      where Active ^= 't';
      model Active*Passive=_response_ 
            / ml=ipf(parm) zero=sampling;
      loglin Active Passive;
   run;

Output 1.1.1: Data Summary and Population Profile
 
Behavior of Squirrel Monkeys

The CATMOD Procedure

Data Summary
Response Active*Passive Response Levels 25
Weight Variable wt Populations 1
Data Set DISPLAY Total Frequency 220
Frequency Missing 0 Observations 25
 
Population Profiles
Sample Sample Size
1 220

The response profiles, shown in Output 1.1.2, include the off-diagonal zero cells because of the ZERO=SAMPLING option.

Output 1.1.2: Response Profiles
 
Response Profiles
Response Active Passive
1 r s
2 r t
3 r u
4 r v
5 r w
6 s r
7 s t
8 s u
9 s v
10 s w
11 u r
12 u s
13 u t
14 u v
15 u w
16 v r
17 v s
18 v t
19 v u
20 v w
21 w r
22 w s
23 w t
24 w u
25 w v

Because the PARM option is specified, a weighted least squares analysis is performed on the IPF fitted data and the _Response_ Matrix is displayed (Output 1.1.3); this table can be suppressed with the NORESPONSE option.

Output 1.1.3: _Response_ Matrix
 
_Response_ Matrix
  1 2 3 4 5 6 7 8 9
1 1 0 0 0 0 1 0 0 0
2 1 0 0 0 0 0 1 0 0
3 1 0 0 0 0 0 0 1 0
4 1 0 0 0 0 0 0 0 1
5 1 0 0 0 -1 -1 -1 -1 -1
6 0 1 0 0 1 0 0 0 0
7 0 1 0 0 0 0 1 0 0
8 0 1 0 0 0 0 0 1 0
9 0 1 0 0 0 0 0 0 1
10 0 1 0 0 -1 -1 -1 -1 -1
11 0 0 1 0 1 0 0 0 0
12 0 0 1 0 0 1 0 0 0
13 0 0 1 0 0 0 1 0 0
14 0 0 1 0 0 0 0 0 1
15 0 0 1 0 -1 -1 -1 -1 -1
16 0 0 0 1 1 0 0 0 0
17 0 0 0 1 0 1 0 0 0
18 0 0 0 1 0 0 1 0 0
19 0 0 0 1 0 0 0 1 0
20 0 0 0 1 -1 -1 -1 -1 -1
21 -1 -1 -1 -1 1 0 0 0 0
22 -1 -1 -1 -1 0 1 0 0 0
23 -1 -1 -1 -1 0 0 1 0 0
24 -1 -1 -1 -1 0 0 0 1 0
25 -1 -1 -1 -1 0 0 0 0 1

The iteration history displays the value of the log likelihood and the convergence criterion for the IPF method as discussed in the "Computational Formulas" section.

Output 1.1.4: Iteration History
 
Maximum Likelihood Analysis
Iteration -2 Log
Likelihood
Convergence Criterion
0 1201.5105 1.0000
1 1198.5669 0.002450
2 1198.5604 5.4468E-6
3 1198.5603 7.702E-8
4 1198.5603 1.6932E-9
 
The IPF algorithm converged.

The "Response Functions and Design Matrix" table (Output 1.1.5) is displayed when the PARM option is specified; this table can be suppressed with the NODESIGN option. The logits are computed from the IPF fitted values rather than the original data.

Output 1.1.5: Response Functions, Design Matrix
 
Response Functions and Design Matrix
Sample Function
Number
Response
Function
Design Matrix
1 2 3 4 5 6 7 8 9
1 1 -0.97354 2 1 1 1 0 1 0 0 -1
  2 -1.72504 2 1 1 1 0 0 1 0 -1
  3 -0.52752 2 1 1 1 0 0 0 1 -1
  4 -0.73927 2 1 1 1 0 0 0 0 0
  5 -3.56052 2 1 1 1 -1 -1 -1 -1 -2
  6 0.32061 1 2 1 1 1 0 0 0 -1
  7 -0.29932 1 2 1 1 0 0 1 0 -1
  8 0.89820 1 2 1 1 0 0 0 1 -1
  9 0.68645 1 2 1 1 0 0 0 0 0
  10 -2.13480 1 2 1 1 -1 -1 -1 -1 -2
  11 -0.24152 1 1 2 1 1 0 0 0 -1
  12 -0.10995 1 1 2 1 0 1 0 0 -1
  13 -0.86145 1 1 2 1 0 0 1 0 -1
  14 0.12432 1 1 2 1 0 0 0 0 0
  15 -2.69693 1 1 2 1 -1 -1 -1 -1 -2
  16 -4.14787 1 1 1 2 1 0 0 0 -1
  17 -4.01631 1 1 1 2 0 1 0 0 -1
  18 -4.76780 1 1 1 2 0 0 1 0 -1
  19 -3.57029 1 1 1 2 0 0 0 1 -1
  20 -6.60328 1 1 1 2 -1 -1 -1 -1 -2
  21 -0.36584 0 0 0 0 1 0 0 0 -1
  22 -0.23427 0 0 0 0 0 1 0 0 -1
  23 -0.98577 0 0 0 0 0 0 1 0 -1
  24 0.21175 0 0 0 0 0 0 0 1 -1

The ANOVA table and the parameter estimates are a by-product of running WLS on the IPF-fitted values. Note that the likelihood ratio chi-square (goodness-of-fit G2) in the ANOVA table is computed from the IPF routine; however, the degrees of freedom for G2 are calculated through WLS. If the PARM option was not specified, then only the likelihood ratio test would be displayed.

Output 1.1.6: ANOVA
 
Maximum Likelihood Analysis of Variance
Source DF Chi-Square Pr > ChiSq
Active 4 56.57 <.0001
Passive 5 47.94 <.0001
Likelihood Ratio 15 135.17 <.0001

Output 1.1.7: Parameter Estimates
 
Analysis of Maximum Likelihood Estimates
Effect Parameter Estimate Standard
Error
Chi-
Square
Pr > ChiSq
Active 1 0.00284 0.2660 0.00 0.9915
  2 1.4286 0.2277 39.35 <.0001
  3 0.8664 0.2428 12.73 0.0004
  4 -3.0399 0.8031 14.33 0.0002
Passive 5 0.3334 0.1739 3.67 0.0552
  6 0.4650 0.1990 5.46 0.0195
  7 -0.2865 0.2019 2.01 0.1558
  8 0.9110 0.1615 31.81 <.0001
  9 0.6992 0.1530 20.88 <.0001

Since the PARM option is specified, the predicted response functions are computed from the WLS fit (this table is not shown here). For the IPF method, the "Maximum Likelihood Predicted Values for Frequencies" table is displayed by default; however, the predicted standard errors are not computed unless the PARM option is specified. The predicted standard errors are computed through WLS.

Output 1.1.8: Predicted Frequencies
 
Maximum Likelihood Predicted Values for Frequencies
Active Passive Observed Predicted Residual
Frequency Standard
Error
Frequency Standard
Error
r s 1 0.997725 5.259562 1.361573 -4.25956
r t 5 2.210512 2.48072 0.691065 2.51928
r u 8 2.776525 8.21586 1.855129 -0.21586
r v 9 2.937996 6.648033 1.509317 2.351967
r w 0 0 0.395767 0.240267 -0.39577
s r 29 5.017696 19.18631 3.147955 9.813693
s t 14 3.620648 10.32189 2.16963 3.678112
s u 46 6.031734 34.18491 4.428728 11.81509
s v 4 1.981735 27.66143 3.722828 -23.6614
s w 0 0 1.646726 0.952727 -1.64673
u r 2 1.407771 10.93611 2.12318 -8.93611
u s 3 1.720201 12.47391 2.554314 -9.47391
u t 1 0.997725 5.88343 1.380627 -4.88343
u v 38 5.606814 15.76689 2.684647 22.23311
u w 2 1.407771 0.938627 0.551631 1.061373
v r 0 0 0.219965 0.22182 -0.21997
v s 0 0 0.250896 0.253756 -0.2509
v t 0 0 0.118337 0.120336 -0.11834
v u 0 0 0.39192 0.393325 -0.39192
v w 1 0.997725 0.018879 0.021731 0.981121
w r 9 2.937996 9.657617 1.808652 -0.65762
w s 25 4.707344 11.01564 2.275041 13.98436
w t 4 1.981735 5.195624 1.18445 -1.19562
w u 6 2.415857 17.20731 2.772074 -11.2073
w v 13 3.497402 13.92365 2.241575 -0.92365

The model of independence does not fit since the likelihood ratio test for the interaction is significant. In other words, active and passive behaviors of the squirrel monkeys are dependent behavior roles.

Results from using the ML=NR option instead of the ML=IPF option are very similar, since these are just two different algorithms for maximum likelihood estimation. Due to the sampling zeros in the table, use of the WLS method is not recommended.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.