SAS Institute. The Power to Know

FOCUS AREAS

Exact Logistic Regression

Release 8.1 SAS/STAT software provides new facilities in the LOGISTIC procedure for performing exact logistic regression. Exact logistic regression has become an important analytical technique, especially in the pharmaceutical industry, since the usual asymptotic methods for analyzing small, skewed, or sparse data sets are unreliable. Inference based on enumerating the exact distributions of sufficient statistics for parameters of interest in a logistic regression model, conditional on the remaining parameters, is computationally infeasible for many problems. Hirji, Mehta, and Patel (1987) developed an efficient algorithm for generating the required conditional distributions, thus making these methods computationally available.

Introduction

Many clinical trials deal with the comparison of populations of subjects with categorical responses. Historically, statistical inference for such studies involves large-sample approximations, and fitting logistic regression models to such data is performed through the unconditional likelihood function. However, asymptotic methods may be inadequate when sample sizes are small or the data are sparse or skewed. Exact conditional inference remains valid in such situations.

The LOGISTIC, GENMOD, PROBIT, and CATMOD procedures perform unconditional likelihood inference for logit models, and the PHREG procedure can perform asymptotic conditional likelihood inference for logit models. However, SAS users have requested the ability to perform exact tests for logistic regression modeling. Release 8.1 SAS/STAT software introduces exact logistic regression for the binary response with the new EXACT statement in the LOGISTIC procedure.

Capabilities

The exact conditional logistic regression facilities in the LOGISTIC procedure provide numerous features, including:

Note that hypothesis tests can be generated for each individual effect in an EXACT statement or for all effects simultaneously. However, parameter estimates are computed for each effect individually.

Example: Dose-Response Study

Consider a small dose-response study where researchers are interested in analyzing how mortality rates change with respect to dosage of a drug. The dose set contains life/death outcomes for six levels of drug dosage (0 to 5). Three subjects are given each specific dose of the drug, and the number of deaths are recorded.

   data dose;
     input Dose Deaths Total @@;
     datalines;
   0 0 3  1 0 3  2 0 3  3 0 3
   4 1 3  5 2 3
   ;
   run;

All of the cells have counts that are less than 5, which makes the applicability of large sample theory questionable. For each subject i receiving dosage xi, i = 1, ..., 18, let Yi = 1 if the subject died, Yi = 0 otherwise, and pi = Pr(Yi = 1 | xi). Then the linear logistic model for this problem is logit(pi) = log(pi / (1-pi))= a + xib, which fits a common intercept and slope for the i subjects. In the PROC LOGISTIC invocation below, the EXACT statement requests an exact analysis and the ESTIMATE option produces exact parameter estimates.

   proc logistic data=dose descending;
     model Deaths/Total = Dose;
     exact Dose  / estimate = both;
   run;

Asymptotic Analysis Results
Figure 1. Asymptotic Analysis Results

Figure 1 displays some of the unconditional asymptotic results that are produced by default. The likelihood ratio and scores test reject the null hypothesis that b is zero. However, the Wald test does not reject this null hypothesis. The seemingly conflicting conclusions of these tests are a sign that the large-sample approximation is unreliable. The estimates for the intercept a and the slope b both have p-values greater than 0.05, indicating marginal influence. The confidence limits for the odds ratio of the dose parameter contains 1. Therefore, you could not conclude, if you accept the model, that there is no change in mortality with a change in dosage.

Exact Analysis Results
Figure 2. Exact Analysis Results

Figure 2 displays the results produced by the EXACT statement. The p-values in the "Conditional Exact Tests" table lead you to reject the null hypothesis that b is equal to zero (no conclusions can be made about a since it is "conditioned" away). Note that the p-values for the asymptotic estimates are larger than the p-values for the exact estimates. The "Exact Parameter Estimates" table shows that the slope b is estimated to be 1.8 with an exact p-value of 0.0245. Note that the confidence interval for the odds ratio does not include the value 1, thus the odds of death increase with dosage.

For more information about performing exact logistic regression using the LOGISTIC procedure, refer to the paper Performing Exact Logistic Regression with the SAS System by Robert Derr.

References

Hirji, Karim F., Mehta, Cyrus R., and Patel, Nitin R. (1987), "Computing Distributions for Exact Logistic Regression," JASA, 82, 1110-1117.

Stokes, Maura E., Davis, Charles S., and Koch, Gary G. (1995), Categorical Data Analysis Using the SAS System, Cary, NC: SAS Institute Inc.


Statistics and Operations Research Home Page | What's New in Data Analysis