Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The CATMOD Procedure

Computational Formulas

The algorithm used for iterative proportional fitting is described in Haberman (1972), Bishop, Fienberg, and Holland (1975), and Agresti (1990). To illustrate the method, consider the observed three-dimensional table {nijk} for the variables X, Y, and Z. The statements

   proc catmod;
      model X*Y*Z = _response_ / ml=ipf;
      loglin X|Y|Z@2;
   run;

request that PROC CATMOD use IPF to fit the hierarchical model

\log(m_{ijk})=\mu + \lambda_i^X + \lambda_j^Y + \lambda_k^Z + \lambda_{ij}^{XY} + \lambda_{ik}^{XZ} + \lambda_{jk}^{YZ}
where {mijk} are the expected frequencies of the cells in the contingency table.

PROC CATMOD begins with a table of initial cell estimates \{\hat m_{ijk}^{(0)}\} that are produced by setting the nsz structural zero cells to 0 and all other cells to n/(nc-nsz), where n is the total weight of the table and nc is the total number of cells in the table. It then iteratively adjusts the estimates at step s-1, \{\hat m_{ijk}^{(s-1)}\}, to the observed marginal tables specified in the model by stepping through the following three-stage process to produce the estimates at step s:

\hat m_{ijk}^{(s_1)}=\hat m_{ijk}^{(s-1)}\frac{n_{ij\cdot} }{\hat m_{ij\cdot }... ...ijk}^{(s)}=\hat m_{ijk}^{(s_2)}\frac{n_{\cdot jk}}{\hat m_{\cdot jk}^{(s_2)}}
where the superscripts (s1) and (s2) indicate the two intermediate tables in the process, and the subscript "·" indicates summation over the missing subscript. The log likelihood ls is estimated at each step s by
l_s=\sum_{i,j,k} n_{ijk}\log(\frac{\hat m_{ijk}^{(s)}}n)
When the function |(ls-1-ls)/ls-1| is less than 1E-8, the iterations terminate. You can change the comparison value with the EPSILON= option, and you can change the convergence criterion with the CONV= option. The option CONV=CELL uses the maximum absolute cell difference
\max_{i,j,k}| d_{ijk}| \lt 0.001 { where } d_{ijk}=\hat m_{ijk}^{(s-1)}-\hat m_{ijk}^{(s)}
as the criterion while the option CONV=MARGIN uses the maximum absolute difference of the margins
\max(\max_{i,j}| d_{ij\cdot}|,\max_{i,k}| d_{i\cdot k}|,\max_{j,k}| d_{\cdot jk}| ) \lt 0.001

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.