MODEL Statement
The following options have been added to the MODEL statement.
- ML < = NR | IPF< ( options ) >>
-
computes maximum likelihood estimates (MLE) using either a
Newton-Raphson algorithm (NR) or an iterative proportional
fitting algorithm (IPF).
The option ML=NR (or simply ML) is available when you use logits or
generalized logits, and is the default for generalized logits.
The ML=IPF option is available for fitting a hierarchical log-linear
model with one population (that is, there are no independent
variables and no population variables). The use of bar notation to
describe the log-linear effects guarantees that the model is
hierarchical, meaning that the presence of any interaction
term in the model requires the presence of all its lower-order
terms.
The underlying table in an IPF analysis is the cross-classification
of the observed levels of all dependent variables. If the table is
incomplete, which means that a zero or missing entry occurs in
at least one cell, then all missing cells and all cells with zero
weight are treated as structural zeros by default. This behavior
can be modified with the ZERO= and
MISSING= options in the MODEL statement.
You can control the convergence of the two algorithms with the
EPSILON= and MAXITER= options in the MODEL statement.
Note: The RESTRICT statement is not available with the
ML=IPF option.
You can specify the following options within
parentheses after the ML=IPF option.
- CONV=keyword
CONVCRIT=keyword
-
specifies the method that determines when convergence of
the IPF algorithm occurs. You can specify one of the following keywords:
- CELL
- termination requires the maximum absolute difference between
consecutive cell estimates to be less than 0.001 (or the value of the EPSILON=
option, if specified).
- LOGL
- termination requires the relative difference between consecutive
estimates of the log likelihood
to be less than 1E-8 (or the value of the EPSILON= option,
if specified). This is the default.
- MARGIN
- termination requires
the maximum absolute difference between consecutive margin
estimates to be less than 0.001 (or the value of the EPSILON=
option, if specified).
- DF=keyword
- specifies the method used to compute the degrees of freedom for
the goodness of fit G2 test (labeled "Likelihood Ratio"
in the "Estimates" table).
For a complete table (a table having nonzero entries in
every cell), the degrees of freedom are calculated as the number
of cells in the table (nc) minus the number of independent
parameters specified in the model (np). For incomplete tables,
these degrees of freedom may be adjusted by the number of fitted
zeros (nz, which includes the number of structural zeros) and
the number of non-estimable parameters due to the zeros (nn).
If you are analyzing an incomplete table, you should verify that
the degrees of freedom are correct.
You can specify one of the following keywords:
- UNADJ
- computes the unadjusted degrees of freedom as
nc-np. These are the same degrees of freedom you would get
if all cells in the table were positive.
- ADJ
- computes the degrees of freedom as
(nc-np)-(nz-nn) (Bishop, Fienberg, and Holland
1975), which adjusts for fitted zeros and non-estimable
parameters. This is the default, and for complete tables gives
the same results as the UNADJ option.
- ADJEST
- computes the degrees of freedom as (nc-np)-nz,
which adjusts for fitted zeros only. This gives a lower bound
on the true degrees of freedom.
- PARM
- computes parameter estimates, generates the "ANOVA,"
"Parameter Estimates," and "Predicted Values of
Response Functions" tables, and includes the predicted standard
errors in the "Predicted Values of Frequencies" and
"Predicted Values of Probabilities" tables.
When you specify the PARM option, the algorithm used to obtain the
maximum likelihood parameter estimates is weighted least squares
on the IPF-predicted frequencies. This algorithm can be much
faster than the Newton-Raphson algorithm used if you just specify
the ML=NR option. In the resulting ANOVA table, the likelihood
ratio is computed from the initial IPF fit while the degrees of
freedom are generated from the WLS analysis; you can override this
with the DF= option. The initial response function, which the WLS
method usually computes from the raw data, is computed from the
IPF fitted frequencies.
If there are any zero marginals in the configurations that define
the model, then predicted cell frequencies of zero will result and
WLS cannot be used to compute the estimates. In this case, PROC
CATMOD automatically changes the algorithm from ML=IPF to ML=NR
and prints a note in the log.
- MISS=keyword | value
-
MISSING=keyword | value
-
specifies whether a missing cell is treated as a sampling or
structural zero.
Structural zero cells are removed from the analysis
since their expected values are zero, while sampling zero
cells may have nonzero expected value.
For a single population, the missing cells are treated as
structural zeros by default. For multiple populations, as
long as some population has a nonzero count for a given
population and response profile, the missing values are
treated as sampling zeros by default.
The following table displays the available keywords
and summarizes how PROC CATMOD treats missing values for one or
more populations.
|
MISSING=
|
One Population
|
Multiple Populations
|
| STRUCTURAL (default) | structural zeros | sampling zeros |
| SAMP | SAMPLING | sampling zeros | sampling zeros |
| value | sets missing weights and cells to value | sets missing weights and cells to value |
-
ZERO=keyword | value
- ZEROS=keyword | value
- ZEROES=keyword | value
-
specifies whether a non-missing cell with zero weight
in the data set is treated as a sampling or structural zero.
Structural zero cells are removed from the analysis
since their expected values are zero, while sampling zero
cells have nonzero expected value.
For a single population, the zero cells are treated as
structural zeros by default; with multiple populations, as
long as some population has a nonzero count for a given
population and response profile, the zeros are
treated as sampling zeros by default.
The following table displays the available keywords
and summarizes how PROC CATMOD treats zeros for one or
more populations.
|
ZERO=
|
One Population
|
Multiple Populations
|
| STRUCTURAL (default) | structural zeros | sampling zeros |
| SAMP | SAMPLING | sampling zeros | sampling zeros |
| value | sets zero weights to value | sets zero weights to value |
Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.