The continuation-ratio logit model is one type of logistic model for an ordinal (ordered) response, as mentioned in this note. The version of the model that has a full parameter vector for each logit can be estimated either by weighted least squares (WLS) in the SAS/STAT® procedure CATMOD, or by maximum likelihood (ML) in the SAS/STAT procedure LOGISTIC. The ML fit requires fitting a binary logistic model for each continuation-ratio logit. A simpler model that has a single parameter vector that is common to all logits can be fit by WLS in PROC CATMOD.
Weighted Least Squares Estimation
The model with separate parameter vectors can be fit in PROC CATMOD by defining the continuation ratio response functions in the RESPONSE statement. Because no response function keyword is available to automatically define continuation-ratio logits (for example, the CLOGITS keyword defines cumulative logits), it is necessary to use the matrix multiplication and log transformation capabilities in the RESPONSE statement to define these logits.
This example from Agresti (2002, 2013) illustrates these capabilities. Pregnant mice were exposed to various concentrations of a toxin and the severity of fetus defects was recorded on an ordinal scale, coded as 1, 2, or 3 in decreasing order of severity.
data micetox; do conc=0, 62.5, 125, 250, 500; do response=1 to 3; input count @@; if response=1 then crl1=0; else crl1=1; output; end; end; datalines; 15 1 281 17 0 225 22 7 283 38 59 202 144 132 9 ;
For a three-level response, two response functions can be defined. Two continuation-ratio logits, log[p1/(p2+p3)] and log[p2/p3], are defined by the RESPONSE statement below. These statements estimate the continuation-ratio logit model by using weighted least squares.
proc catmod; weight count; response * 1 -1 0 0, 0 0 1 -1 log * 1 0 0, 0 1 1, 0 1 0, 0 0 1; /* p1 p2 p3 */ direct conc; model response=conc / addcell=0.1 freq; run; quit;
Operations in the RESPONSE statement are conducted from right to left with the vector of observed response probabilities understood to be the right-most quantity. Matrices are written with commas separating rows. The preceding RESPONSE statement is written across multiple lines to make it more readable. The comment that follows the RESPONSE statement shows the implicit vector of response probabilities for the three-level response in this example. The RESPONSE statement can be read from right to left:
(p1 p2+p3 p2 p3)'
(log(p1) log(p2+p3) log(p2) log(p3))'
(log(p1)-log(p2+p3) log(p2)-log(p3))'The result can be rewritten this way. These are the desired continuation-ratio logits.
(log[p1/(p2+p3)] log[p2/p3])'
The ADDCELL= option is used to add a small value to each cell of the data table, because otherwise the zero cell in the data would reduce the number of observed response levels in the second concentration to two and cause a linear dependency in the two response functions. Note that adding different values to the cells would affect the results (as would adding a value to only the zero cell), and you should determine the sensitivity of the results to the various ways of avoiding the dependency. The FREQ option displays the table that PROC CATMOD is analyzing.
If the two concentration parameters were determined to be equal, this model could be simplified to have only a single concentration parameter by including the _RESPONSE_ keyword in the MODEL statement:
proc catmod; weight count; response * 1 -1 0 0, 0 0 1 -1 log * 1 0 0, 0 1 1, 0 1 0, 0 0 1; /* p1 p2 p3 */ direct conc; model response=_response_ conc / addcell=0.1 freq; run; quit;
However, doing so for this data results in significant lack of fit, as indicated by the test for Residual. For models that involve multiple predictors, you can allow separate parameters across the logits for some predictors while restricting other predictors to have a common parameter across logits. This is done by interacting the _RESPONSE_ keyword with those predictors that should have separate parameters. For example, the following MODEL statement specifies a common parameter for predictor A, but separate parameters for predictor B:
model response = _response_ a b _response_*b;
This also provides an easy test for the difference in a predictor's parameters across the logits. In the preceding model, if the _RESPONSE_*B term is significant, then the parameters for B are not equal across the logits and separate parameters are needed in the model.
Maximum-Likelihood Estimation
The model that contains separate parameter vectors for the logits can be fit using separate binary logistic models. In this example, the first continuation-ratio logit contrasts the probability of the first response level with the second and third. The variable CRL1 that is defined in the preceding DATA step is a binary variable that provides the same contrast, having value 1 when the response has value 1, and having value 0 when the response has value 2 or 3. The first LOGISTIC step in the following code fits a binary logistic model to this variable to provide the parameter vector (intercept and CONC parameter) for the first continuation-ratio logit. The WHERE statement in the second LOGISTIC step limits the response to levels 2 and 3 so that the analysis models the second continuation-ratio logit and provides its parameter vector.
ods output goodnessoffit(persist)=gof(where=(Criterion='Deviance')); proc logistic data=micetox; model crl1(event="0")=conc / scale=none aggregate; freq count; run; proc logistic data=micetox; where response in (2,3); model response(event="2")=conc / scale=none aggregate; freq count; run; ods output close;
In order to get a likelihood-ratio statistic for the overall model, it is necessary to sum the degrees of freedom and the deviance values from the two separate fits. The SCALE=NONE and AGGREGATE options request the GoodnessOfFit table for each analysis that contains the deviance. The ODS OUTPUT statement before the first LOGISTIC step creates a single SAS data set that contains the deviances from the two models. The second ODS OUTPUT statement closes the data set, making it available for further processing. The statements that follow sum the degrees of freedom and the deviances and compute a chi-square test. The chi-square statistic is the overall model deviance and is comparable to the Residual value that is produced by PROC CATMOD.
proc summary; var df chisq; output out=gof2 sum=df chisq; run; data gof2; set gof2; p=1-probchi(chisq,df); run; proc print noobs label; var chisq df p; label chisq='Chi-square' df='DF' p='Pr > ChiSq'; format p pvalue.; title 'Deviance for Continuation-Ratio Model'; run;
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | All | n/a |
Type: | Usage Note |
Priority: | low |
Topic: | SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> CATMOD Analytics ==> Categorical Data Analysis Analytics ==> Regression |
Date Modified: | 2019-05-03 14:20:07 |
Date Created: | 2006-03-07 14:14:10 |