In matched pairs, or case-control, studies, conditional logistic regression is used to investigate the relationship between an outcome of being an event (case) or a nonevent (control) and a set of prognostic factors.
The following data are a subset of the data from the Los Angeles Study of the Endometrial Cancer Data in Breslow and Day (1980). There are 63 matched pairs, each consisting of a case of endometrial cancer (Outcome
=1) and a control (Outcome
=0). The case and corresponding control have the same ID
. Two prognostic factors are included: Gall
(an indicator variable for gall bladder disease) and Hyper
(an indicator variable for hypertension). The goal of the case-control analysis is to determine the relative risk for gall
bladder disease, controlling for the effect of hypertension.
data Data1; do ID=1 to 63; do Outcome = 1 to 0 by -1; input Gall Hyper @@; output; end; end; datalines; 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 ;
There are several ways to approach this problem with PROC LOGISTIC:
Specify the STRATA statement to perform a conditional logistic regression.
Specify EXACT and STRATA statements to perform an exact logistic regression on the original data set, if you believe the data set is too small or too sparse for the usual asymptotics to hold.
Transform each matched pair into a single observation, and then specify a PROC LOGISTIC statement on this transformed data without a STRATA statement; this also performs a conditional logistic regression and produces essentially the same results.
Specify an EXACT statement on the transformed data.
SAS statements and selected results for the first two approaches are given in the remainder of this example.
In the following statements, PROC LOGISTIC is invoked with the ID
variable declared in the STRATA
statement to obtain the conditional logistic model estimates for a model containing Gall
as the only predictor variable:
proc logistic data=Data1; strata ID; model outcome(event='1')=Gall; run;
Results from the conditional logistic analysis are shown in Output 72.11.1. Note that there is no intercept term in the "Analysis of Maximum Likelihood Estimates" tables.
The odds ratio estimate for Gall
is 2.60, which is marginally significant (p = 0.0694) and which is an estimate of the relative risk for gall bladder disease. A 95% confidence interval for this relative
risk is (0.927, 7.293).
Output 72.11.1: Conditional Logistic Regression (Gall as Risk Factor)
When you believe there are not enough data or that the data are too sparse, you can perform a stratified exact logistic regression. The following statements perform stratified exact logistic regressions on the original data set by specifying both the STRATA and EXACT statements:
proc logistic data=Data1 exactonly; strata ID; model outcome(event='1')=Gall; exact Gall / estimate=both; run;
Output 72.11.2: Exact Logistic Regression (Gall as Risk Factor)
Note that the score statistic in the "Conditional Exact Tests" table in Output 72.11.2 is identical to the score statistic in Output 72.11.1 from the conditional analysis. The exact odds ratio confidence interval is much wider than its conditional analysis counterpart,
but the parameter estimates are similar. The exact analysis confirms the marginal significance of Gall
as a predictor variable.