PROC LOGISTIC automatically computes a test of the proportional odds assumption when the response is ordinal and the default logit link is used. For such a response, several cumulative logits are simultaneously modeled while only a single logit is modeled for a binary response. Conceptually, you could fit a model that has a complete set of parameter estimates for each of the cumulative logits. The simpler model that PROC LOGISTIC fits constrains each predictor's parameter estimates to be the same across all of the logits. This means that the fitted surfaces for the logits are all parallel and they are only allowed to differ by a constant shift that necessitates the separate intercepts that you get when you fit an ordinal model. When the logit link is used, this parallelism assumption also implies that the effect of a given predictor is the same regardless of where you "cut" the ordinal response to dichotomize it. The proportional odds test in PROC LOGISTIC simply tests whether the parameters are the same across logits, simultaneously for all predictors. PROC GENMOD fits the same proportional odds model, but it does not provide a proportional odds test.
Peterson and Harrell (1990) note that this proportional odds/parallelism test is known to be liberal when the sample size is small, which means that the p-value for the test could be artificially too small in small samples, possibly leading to inappropriate rejection of the proportional odds assumption. Because of this, graphical assessment of the parallelism assumption is useful and is discussed and illustrated in this note. However, nonsignificant test results (large p-values) are not affected and reliably indicate adequacy of the proportional odds/parallel assumption. Stokes, et. al. (2012) suggests doing cross-tabulations of the response with each predictor involved in the model. If all cell counts are about five or larger, then the sample size should be adequate.
If the sample size is adequate and the test or the graphs convince you to reject the assumption of proportional odds, then there are two alternative models that are often used. The first is the generalized logit model which can be fit by simply specifying the LINK=GLOGIT option in the MODEL statement of PROC LOGISTIC. This model treats the response as nominal (unordered) rather than ordinal and has a full set of parameters for each generalized logit.
The second alternative is the partial proportional odds model. This model continues to treat the response as ordinal, but allows you to assume proportional odds for some predictors while not for others. To do this, the model contains separate parameters across the logits for model effects exhibiting a lack of proportionality. A single parameter is used for effects where proportionality holds. At the extreme, the fully nonproportional odds model can be fit in which separate parameters are used for every effect in the model. Beginning in SAS® 9.3 TS1M2, you can fit this model using the UNEQUALSLOPES option in the MODEL statement of PROC LOGISTIC. In earlier releases, the model can also be fit using PROC NLMIXED as shown in the addendum at the end of this note.
In the following sections, proportional, fully nonproportional, and partial proportional odds models are fit to data from a study of a new analgesic for dental pain relief. The DENT data set can be found at this location. Increasing relief from pain was recorded on a 5-point scale (0-4). Initially, PROC LOGISTIC is used to fit a proportional odds model involving three categorical predictors: the research center (CENTER=1 or 2), the treatment dosage (TRT=ACL, TL, ACH, TH, or P), and baseline pain (BASELINE=0 or 1). The proportional odds test is significant and tables produced with PROC FREQ (not shown) of each predictor against the response show generally adequate sample size. The alternative models are then fit.
Derr (2013) also discusses the testing and assessment of the proportional odds assumption. When the proportional odds assumption is rejected, he illustrates fitting the partial proportional odds model.
The following statements fit the proportional odds model to the dental data using PROC LOGISTIC. In PROC LOGISTIC, the CLASS statement creates reference-coded dummy variables for each of the three categorical predictors. The ORDER=DATA option causes predictor levels to be ordered as they first appear in the data set. The DESCENDING response variable option allows you to model the probability of the higher response levelsNOTE. The OUTPUT statement saves the cumulative predicted probabilities and the individual predicted probabilities to data set LOG.
proc logistic data=dent; class center baseline trt / param=ref order=data; model resp(descending)=center baseline trt; output out=log predprobs=(c i); run;
The test of the proportional odds assumption in PROC LOGISTIC is significant (p=0.0089) indicating that proportional odds does not hold and suggesting that separate parameters are needed across the logits for at least one predictor. A visual assessment of the assumption is provided by plotting the empirical logits.
|
The addition of the UNEQUALSLOPES option causes PROC LOGISTIC to fit the fully nonproportional odds model which includes separate slope parameters for each effect on each of the logits. If you also include the EQUALSLOPES option (available beginning in SAS 9.4 TS1M2), the model includes both a common, reference slope (slope on last logit) for each effect as well as parameters representing the differences between the reference slope and the separate slopes. The results include joint tests of these sets of difference parameters in the "Type 3 Analysis of Effects" table. Each set is labeled U_effect which is a test of the proportional odds assumption for effect. These statements fit the fully nonproportional odds model, parametrized to test the proportional odds assumption in each model effect.
proc logistic data=dent; class center baseline trt / param=ref order=data; model resp(descending) = center baseline trt / unequalslopes equalslopes; run;
The nonsignificant tests for the CENTER and BASELINE effects (p=0.1782 and p=0.1727, respectively) suggest that the proportional odds assumption is reasonable for those effects. The test of proportional odds for the TRT effect (p=0.0688) is marginally significant. This seems consistent with the plots of the empirical logits.
|
In the Parameter Estimates table, notice that the parameter estimates for each effect include a common, reference slope parameter (slope on RESP=1 logit) followed by three difference parameters for a total of four degrees of freedom. The three difference parameters for CENTER and BASELINE are all nonsignificant, as expected, but the difference parameters for TRT="ACH" show some significance which explains the marginally significant proportional odds test for the TRT effect.
|
Note that you can also test for proportionality in each of the model effects and overall by omitting the EQUALSLOPES option and including appropriate TEST statements. When the UNEQUALSLOPES option appears without the EQUALSLOPES option, the parameters for each effect are the separate slopes on the logits rather than differences from the reference slope. The following four labeled TEST statements following the MODEL statement provide tests of proportionality. The first three TEST statements test for proportional odds in each of the three predictors. In the CENTER proportionality test (labeled CENTER_PO), the equality of the CENTER parameters across the four logits is tested. Similarly for the BASELINE proportionality test. For the TRT test, equality across the logits is tested simultaneously within each level of TRT. Parameter names used in the TEST statement are specified as described in "Parameter Names in the OUTEST= Data Set" in the Details section of the LOGISTIC documentation. The final TEST statement provides an overall test of proportional odds, similar to the test provided when the UNEQUALSLOPES option is not used. It combines the previous three tests into a single, joint test.
proc logistic data=dent; class center baseline trt / param=ref order=data; model resp(descending) = center baseline trt / unequalslopes; CENTER_PO:test center1_4=center1_3=center1_2=center1_1; BASELINE_PO:test baseline0_4=baseline0_3=baseline0_2=baseline0_1; TRT_PO:test trtacl_4=trtacl_3=trtacl_2=trtacl_1, trttl_4=trttl_3=trttl_2=trttl_1, trtach_4=trtach_3=trtach_2=trtach_1, trtth_4=trtth_3=trtth_2=trtth_1; OVERALL_PO:test center1_4=center1_3=center1_2=center1_1, baseline0_4=baseline0_3=baseline0_2=baseline0_1, trtacl_4=trtacl_3=trtacl_2=trtacl_1, trttl_4=trttl_3=trttl_2=trttl_1, trtach_4=trtach_3=trtach_2=trtach_1, trtth_4=trtth_3=trtth_2=trtth_1; run;
In this parameterization of the fully nonproportional odds model, the separate slopes of each effect on each logit are given. The logit to which each parameter estimate applies is given in the RESP column. The value in the RESP column is the level after which the ordered response levels are cut to dichotomize the response. For instance, the value 4 indicates the logit comparing level 4 with levels 3, 2, and 1. The value 3 indicates the logit comparing levels 4 and 3 with levels 2 and 1, and so on.
|
The overall test of proportional odds from the fourth TEST statement (OVERALL_PO) is a Wald test and gives results (p=0.0038) similar to the score test provided with the proportional odds model above. As before, only the test of the TRT effect (TRT_PO) suggests the possibility of nonproportionality.
|
If you are willing to assume that proportional odds holds for the BASELINE and CENTER predictors, then a partial proportional odds model can be fit which allows for nonproportionality only in the TRT predictor by specifying the UNEQUALSLOPES=TRT option. A common parameter is used across the logits for BASELINE and CENTER while separate parameters are used for the TRT levels. The following statements fit this partial proportional odds model.
Additionally, suppose you want to test whether the TRT levels differ. Normally, this can be done using the DIFF option in an LSMEANS statement. However, this statement is not available when the UNEQUALSLOPES or EQUALSLOPES option is specified. Instead, TEST statements are used. The first four labeled TEST statements test for treatment differences in each of the four logits. The final TEST statement (TRT_PO) tests for proportional odds for TRT in this simplified model.
proc logistic data=dent; class center baseline trt / param=ref order=data; model resp(descending)=center baseline trt / unequalslopes=trt; TRT_RESP4:test trtacl_4,trttl_4,trtach_4,trtth_4; TRT_RESP3:test trtacl_3,trttl_3,trtach_3,trtth_3; TRT_RESP2:test trtacl_2,trttl_2,trtach_2,trtth_2; TRT_RESP1:test trtacl_1,trttl_1,trtach_1,trtth_1; TRT_PO:test trtacl_4=trtacl_3=trtacl_2=trtacl_1, trttl_4=trttl_3=trttl_2=trttl_1, trtach_4=trtach_3=trtach_2=trtach_1, trtth_4=trtth_3=trtth_2=trtth_1; run;
The table of parameter estimates shows the separate intercepts and separate TRT parameters for each logit, and the single parameter for each of BASELINE and CENTER in this partial proportional odds model.
|
The tests for treatment differences suggest that the treatments differ on logits 1, 2, and 3, but not on logit 4 (p=0.4925). The test for proportional odds in the treatment effect (TEST_PO) continues to indicate possible lack of proportionality (p=0.0729).
|
You can use the SELECTION= option to have PROC LOGISTIC determine which effects exhibit nonproportional odds and choose a final model. The following statements use stepwise effect selection to select a final model from a set of candidate effects that includes all equal and unequal slope parameters. Beginning in SAS 9.4 TS1M2, the EQUALSLOPES option makes the equal slope parameters available. Used with the UNEQUALSLOPES option, all equal and unequal slope parameters are available for effect selection. The INCLUDE=EQUALSLOPES option restricts the selection by requiring every model in the selection process to include the equal slope parameters. The selection process tests the unequal slope parameters for an effect and includes and retains them if significant at the 0.1 level. The DETAILS option shows the tests of the candidate effects at each step.
proc logistic data=dent; class center baseline trt / param=ref order=data; model resp(descending)=center baseline trt / selection=stepwise slentry=0.1 slstay=0.1 equalslopes unequalslopes include=equalslopes details; run;
The model in the first step includes the intercepts and equal slope parameters. This is the same proportional odds model as shown above. The analysis at this step includes tests of the remaining candidate effects, which are the unequal slope parameters for each of the three predictors. The U_ prefix indicates that these are the unequal slope parameters for each predictor. The score tests give similar results to the Wald tests for proportional odds shown above.
|
Since the unequal slopes effect for TRT is significant at the 0.1 level, it is included in the next step of the model selection process. This results in the same partial proportional odds model shown above but is parameterized slightly differently with a set of equal slope parameters followed by sets of unequal slope parameters. For each TRT level, the equal slope parameter is used as the slope for the last logit (RESP=1), and the unequal slope parameters (U_TRT) are deviations from the equal slope parameter.
|
Since tests of the remaining nonproportional odds effects are not significant, the selection process ends.
|
An additional example of fitting nonproportional odds models and using model selection can be found in the example titled "Partial Proportional Odds Model" in the LOGISTIC documentation.
References
Peterson, B. and F. Harrell (1990), "Partial Proportional Odds Models for Ordinal Response Variables," Applied Statistics , 39:205-217.
Stokes, M.E., Davis, C.S., and Koch, G.G.
Addendum: Using NLMIXED to fit ordinal logistic models
The following section fits the same models as above using PROC NLMIXED.
In PROC NLMIXED, reference-coded dummies (T1-T4, B, and C) are created in the same way as done by the CLASS statement in PROC LOGISTIC. Four cumulative probabilities (CP1-CP4) are computed on the five levels of the response (RESP=0, 1, 2, 3, 4) . The model is contained in these statements. INT1-INT4 are the four intercepts of the proportional odds model. Note that the same name is used across the four cumulative probabilities for each of the other model parameters (BC1, BB0, BACL, BTL, BACH, and BTH). Initial values for the intercepts are provided in the previous PARMS statement. The increasing initial values help ensure properly increasing cumulative probabilities and avoid estimation problems. All other parameters are started at the default value of one. The individual probabilities are computed from the cumulative probabilities by taking differences. For instance, the probability of response 3 is obtained by subtracting the probability of response 4 from the probability of response 3 or 4. These statements result in modeling the probability of higher response levels as in the LOGISTIC modelNOTE. The next three statements ensure that the predicted probability values are valid and define the log likelihood to be maximized by the procedure. The ID and PREDICT statements save the cumulative probabilities and the predicted probability of the observed response to data set NLM.
proc nlmixed data=dent; parms Int4 = -1, Int3 = 0, Int2 = 1, Int1 = 2; t1=(trt='ACL'); t2=(trt='TL'); t3=(trt='ACH'); t4=(trt='TH'); b=(baseline=0); c=(center=1); cp4= logistic(Int4 + bc1*c + bb0*b + bACL*t1 + bTL*t2 + bACH*t3 + bTH*t4); cp3= logistic(Int3 + bc1*c + bb0*b + bACL*t1 + bTL*t2 + bACH*t3 + bTH*t4); cp2= logistic(Int2 + bc1*c + bb0*b + bACL*t1 + bTL*t2 + bACH*t3 + bTH*t4); cp1= logistic(Int1 + bc1*c + bb0*b + bACL*t1 + bTL*t2 + bACH*t3 + bTH*t4); if resp=4 then ip = cp4; /* CP4 is Pr(resp=4) */ else if resp=3 then ip = cp3-cp4; /* CP3 is Pr(resp=4 or 3) */ else if resp=2 then ip = cp2-cp3; /* CP2 is Pr(resp=4 or 3 or 2) */ else if resp=1 then ip = cp1-cp2; /* CP1 is Pr(resp=4 or 3 or 2 or 1) */ else ip = 1-cp1; /* IP is Pr(resp=observed level) */ p = (ip>0 and ip<=1)*ip + (ip<=0)*1e-8 + (ip>1); loglik = log(p); model resp ~ general(loglik); id cp1-cp4; predict ip out=nlm; run;
The parameter estimates from PROC NLMIXED agree closely with those from PROC LOGISTIC. The standard errors differ slightly since LOGISTIC constructs Wald tests while NLMIXED uses t-tests based on the available degrees of freedom. The predicted probabilities produced by the two procedures (in data sets LOG and NLM) are also very similar.
Note that unique names are used across the cumulative probabilities for each model effect. For instance, instead of the common parameter BC1 for the CENTER effect in the proportional odds model above, there are now four parameters, BC11, BC12, BC13, and BC14. Similarly for the BASELINE and TRT effects. The four CONTRAST statements with "PO" in their labels provide proportional odds tests for each of the predictors as well as an overall test comparable to the test provided by PROC LOGISTIC. Preceding these, additional contrasts are done to test the effect of each predictor on logit 1 which contrasts response level 0 (no relief) with the higher response levels representing varying degrees of relief.
The parameter estimates of the fully nonproportional odds model has four complete sets of parameter estimates compared to the single set in the proportional odds model. The logit to which the parameter estimate applies is given in the last character of its name. For instance, the estimates Int4, bc14, bb04, bACL4, bTL4, bACH4, and bTH4 are the parameter estimates associated with the fourth logit which contrasts the highest level of relief against all other levels.
proc nlmixed data=dent; parms Int4 = -1, Int3 = 0, Int2 = 1, Int1 = 2; t1=(trt='ACL'); t2=(trt='TL'); t3=(trt='ACH'); t4=(trt='TH'); b=(baseline=0); c=(center=1); cp4= logistic(Int4 + bc14*c + bb04*b + bACL4*t1 + bTL4*t2 + bACH4*t3 + bTH4*t4); cp3= logistic(Int3 + bc13*c + bb03*b + bACL3*t1 + bTL3*t2 + bACH3*t3 + bTH3*t4); cp2= logistic(Int2 + bc12*c + bb02*b + bACL2*t1 + bTL2*t2 + bACH2*t3 + bTH2*t4); cp1= logistic(Int1 + bc11*c + bb01*b + bACL1*t1 + bTL1*t2 + bACH1*t3 + bTH1*t4); if resp=4 then ip = cp4; else if resp=3 then ip = cp3-cp4; else if resp=2 then ip = cp2-cp3; else if resp=1 then ip = cp1-cp2; else ip = 1-cp1; p = (ip>0 and ip<=1)*ip + (ip<=0)*1e-8 + (ip>1); loglik = log(p); model resp ~ general(loglik); id cp1-cp4; predict ip out=nlm; contrast 'center on logit1 (4321 vs 0)' bc11; contrast 'baseline on logit1 (4321 vs 0)' bb01; contrast 'trt on logit1 (4321 vs 0)' bACL1, bTL1, bACH1, bTH1; contrast 'center PO' bc11-bc14, bc12-bc14, bc13-bc14; contrast 'baseline PO' bb01-bb04, bb02-bb04, bb03-bb04; contrast 'trt PO' bACL1-bACL4, bACL2-bACL4, bACL3-bACL4, bTL1-bTL4, bTL2-bTL4, bTL3-bTL4, bACH1-bACH4, bACH2-bACH4, bACH3-bACH4, bTH1-bTH4, bTH2-bTH4, bTH3-bTH4; contrast 'overall PO' bc11-bc14, bc12-bc14, bc13-bc14, bb01-bb04, bb02-bb04, bb03-bb04, bACL1-bACL4, bACL2-bACL4, bACL3-bACL4, bTL1-bTL4, bTL2-bTL4, bTL3-bTL4, bACH1-bACH4, bACH2-bACH4, bACH3-bACH4, bTH1-bTH4, bTH2-bTH4, bTH3-bTH4; run;
A partial proportional odds model is fit with a common parameter across the logits for BASELINE (BB0) and CENTER (BC1) and separate parameters for the TRT levels. The following statements fit this partial proportional odds model in PROC NLMIXED.
proc nlmixed data=dent; parms Int4 = -1, Int3 = 0, Int2 = 1, Int1 = 2; t1=(trt='ACL'); t2=(trt='TL'); t3=(trt='ACH'); t4=(trt='TH'); b=(baseline=0); c=(center=1); cp4= logistic(Int4 + bc1*c + bb0*b + bACL4*t1 + bTL4*t2 + bACH4*t3 + bTH4*t4); cp3= logistic(Int3 + bc1*c + bb0*b + bACL3*t1 + bTL3*t2 + bACH3*t3 + bTH3*t4); cp2= logistic(Int2 + bc1*c + bb0*b + bACL2*t1 + bTL2*t2 + bACH2*t3 + bTH2*t4); cp1= logistic(Int1 + bc1*c + bb0*b + bACL1*t1 + bTL1*t2 + bACH1*t3 + bTH1*t4); if resp=4 then ip = cp4; else if resp=3 then ip = cp3-cp4; else if resp=2 then ip = cp2-cp3; else if resp=1 then ip = cp1-cp2; else ip = 1-cp1; p = (ip>0 and ip<=1)*ip + (ip<=0)*1e-8 + (ip>1); loglik = log(p); model resp ~ general(loglik); id cp1-cp4; predict ip out=nlm; run;
__________
NOTE: If you want to model the probability of lower, rather than higher, response levels omit the DESCENDING response variable option in the MODEL statement of PROC LOGISTIC. In PROC NLMIXED, you could simply use increasing values of RESP in the statements defining the individual probability. For instance:
if resp=0 then ip = cp4; else if resp=1 then ip = cp3-cp4; else if resp=2 then ip = cp2-cp3; else if resp=3 then ip = cp1-cp2; else ip = 1-cp1;
But the results will be more meaningfully labeled if you also change the names of the parameters and cumulative probabilities as below for the proportional odds model:
proc nlmixed data=dent; parms Int0 = -1, Int1 = 0, Int2 = 1, Int3 = 2; t1=(trt='ACL'); t2=(trt='TL'); t3=(trt='ACH'); t4=(trt='TH'); b=(baseline=0); c=(center=1); cp0= logistic(Int0 + bc1*c + bb0*b + bACL*t1 + bTL*t2 + bACH*t3 + bTH*t4); cp1= logistic(Int1 + bc1*c + bb0*b + bACL*t1 + bTL*t2 + bACH*t3 + bTH*t4); cp2= logistic(Int2 + bc1*c + bb0*b + bACL*t1 + bTL*t2 + bACH*t3 + bTH*t4); cp3= logistic(Int3 + bc1*c + bb0*b + bACL*t1 + bTL*t2 + bACH*t3 + bTH*t4); if resp=0 then ip = cp0; /* CP0 is Pr(resp=0) */ else if resp=1 then ip = cp1-cp0; /* CP1 is Pr(resp=0 or 1) */ else if resp=2 then ip = cp2-cp1; /* CP2 is Pr(resp=0 or 1 or 2) */ else if resp=3 then ip = cp3-cp2; /* CP3 is Pr(resp=0 or 1 or 2 or 3) */ else ip = 1-cp3; /* IP is Pr(resp=observed level) */ p = (ip>0 and ip<=1)*ip + (ip<=0)*1e-8 + (ip>1); loglik = log(p); model resp ~ general(loglik); id cp1-cp4; predict ip out=nlm; run;
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | All | n/a |
Type: | Usage Note |
Priority: | low |
Topic: | SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> LOGISTIC Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> NLMIXED |
Date Modified: | 2021-07-26 15:23:56 |
Date Created: | 2002-12-16 10:56:41 |