Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The CATMOD Procedure

MODEL Statement

The following options have been added to the MODEL statement.

ML < = NR | IPF< ( options ) >>
computes maximum likelihood estimates (MLE) using either a Newton-Raphson algorithm (NR) or an iterative proportional fitting algorithm (IPF).

The option ML=NR (or simply ML) is available when you use logits or generalized logits, and is the default for generalized logits.

The ML=IPF option is available for fitting a hierarchical log-linear model with one population (that is, there are no independent variables and no population variables). The use of bar notation to describe the log-linear effects guarantees that the model is hierarchical, meaning that the presence of any interaction term in the model requires the presence of all its lower-order terms. The underlying table in an IPF analysis is the cross-classification of the observed levels of all dependent variables. If the table is incomplete, which means that a zero or missing entry occurs in at least one cell, then all missing cells and all cells with zero weight are treated as structural zeros by default. This behavior can be modified with the ZERO= and MISSING= options in the MODEL statement.

You can control the convergence of the two algorithms with the EPSILON= and MAXITER= options in the MODEL statement.

Note: The RESTRICT statement is not available with the ML=IPF option.

You can specify the following options within parentheses after the ML=IPF option.

CONV=keyword
CONVCRIT=keyword
specifies the method that determines when convergence of the IPF algorithm occurs. You can specify one of the following keywords:
CELL
termination requires the maximum absolute difference between consecutive cell estimates to be less than 0.001 (or the value of the EPSILON= option, if specified).
LOGL
termination requires the relative difference between consecutive estimates of the log likelihood to be less than 1E-8 (or the value of the EPSILON= option, if specified). This is the default.
MARGIN
termination requires the maximum absolute difference between consecutive margin estimates to be less than 0.001 (or the value of the EPSILON= option, if specified).

DF=keyword
specifies the method used to compute the degrees of freedom for the goodness of fit G2 test (labeled "Likelihood Ratio" in the "Estimates" table).

For a complete table (a table having nonzero entries in every cell), the degrees of freedom are calculated as the number of cells in the table (nc) minus the number of independent parameters specified in the model (np). For incomplete tables, these degrees of freedom may be adjusted by the number of fitted zeros (nz, which includes the number of structural zeros) and the number of non-estimable parameters due to the zeros (nn). If you are analyzing an incomplete table, you should verify that the degrees of freedom are correct.

You can specify one of the following keywords:
UNADJ
computes the unadjusted degrees of freedom as nc-np. These are the same degrees of freedom you would get if all cells in the table were positive.
ADJ
computes the degrees of freedom as (nc-np)-(nz-nn) (Bishop, Fienberg, and Holland 1975), which adjusts for fitted zeros and non-estimable parameters. This is the default, and for complete tables gives the same results as the UNADJ option.
ADJEST
computes the degrees of freedom as (nc-np)-nz, which adjusts for fitted zeros only. This gives a lower bound on the true degrees of freedom.

PARM
computes parameter estimates, generates the "ANOVA," "Parameter Estimates," and "Predicted Values of Response Functions" tables, and includes the predicted standard errors in the "Predicted Values of Frequencies" and "Predicted Values of Probabilities" tables.

When you specify the PARM option, the algorithm used to obtain the maximum likelihood parameter estimates is weighted least squares on the IPF-predicted frequencies. This algorithm can be much faster than the Newton-Raphson algorithm used if you just specify the ML=NR option. In the resulting ANOVA table, the likelihood ratio is computed from the initial IPF fit while the degrees of freedom are generated from the WLS analysis; you can override this with the DF= option. The initial response function, which the WLS method usually computes from the raw data, is computed from the IPF fitted frequencies.

If there are any zero marginals in the configurations that define the model, then predicted cell frequencies of zero will result and WLS cannot be used to compute the estimates. In this case, PROC CATMOD automatically changes the algorithm from ML=IPF to ML=NR and prints a note in the log.

MISS=keyword | value
MISSING=keyword | value
specifies whether a missing cell is treated as a sampling or structural zero.

Structural zero cells are removed from the analysis since their expected values are zero, while sampling zero cells may have nonzero expected value. For a single population, the missing cells are treated as structural zeros by default. For multiple populations, as long as some population has a nonzero count for a given population and response profile, the missing values are treated as sampling zeros by default.

The following table displays the available keywords and summarizes how PROC CATMOD treats missing values for one or more populations.

MISSING= One Population Multiple Populations
STRUCTURAL (default)structural zerossampling zeros
SAMP | SAMPLINGsampling zerossampling zeros
valuesets missing weights and cells to valuesets missing weights and cells to value


ZERO=keyword | value
ZEROS=keyword | value
ZEROES=keyword | value
specifies whether a non-missing cell with zero weight in the data set is treated as a sampling or structural zero.

Structural zero cells are removed from the analysis since their expected values are zero, while sampling zero cells have nonzero expected value. For a single population, the zero cells are treated as structural zeros by default; with multiple populations, as long as some population has a nonzero count for a given population and response profile, the zeros are treated as sampling zeros by default.

The following table displays the available keywords and summarizes how PROC CATMOD treats zeros for one or more populations.

ZERO= One Population Multiple Populations
STRUCTURAL (default)structural zerossampling zeros
SAMP | SAMPLINGsampling zerossampling zeros
valuesets zero weights to valuesets zero weights to value


Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.