The SURVEYLOGISTIC Procedure

Odds Ratio Estimation

Consider a dichotomous response variable with outcomes event and nonevent. Let a dichotomous risk factor variable X take the value 1 if the risk factor is present and 0 if the risk factor is absent. According to the logistic model, the log odds function, $g(X)$, is given by

\[  g(X) \equiv \log \biggl (\frac{\Pr (~ \mathit{event} ~ |~  X)}{\Pr (~ \mathit{nonevent} ~ |~  X)} \biggr ) = \beta _0 + \beta _1 X \\  \]

The odds ratio $\psi $ is defined as the ratio of the odds for those with the risk factor (X = 1) to the odds for those without the risk factor (X = 0). The log of the odds ratio is given by

\[  \log (\psi ) \equiv \log (\psi (X=1,X=0)) = g(X=1) - g(X=0) = \beta _1  \]

The parameter, $\beta _1$, associated with X represents the change in the log odds from X = 0 to X = 1. So the odds ratio is obtained by simply exponentiating the value of the parameter associated with the risk factor. The odds ratio indicates how the odds of event change as you change X from 0 to 1. For instance, $\psi =2$ means that the odds of an event when X = 1 are twice the odds of an event when X = 0.

Suppose the values of the dichotomous risk factor are coded as constants a and b instead of 0 and 1. The odds when $X = a$ become $\exp (\beta _0 + a \beta _1)$, and the odds when $X = b$ become $\exp (\beta _0 + b \beta _1)$. The odds ratio corresponding to an increase in X from a to b is

\[  \psi = \exp [(b - a) \beta _1] = [\exp (\beta _1)]^{b-a} \equiv [\exp (\beta _1)]^ c  \]

Note that for any a and b such that $c=b-a=1, \psi =\exp (\beta _1)$. So the odds ratio can be interpreted as the change in the odds for any increase of one unit in the corresponding risk factor. However, the change in odds for some amount other than one unit is often of greater interest. For example, a change of one pound in body weight might be too small to be considered important, while a change of 10 pounds might be more meaningful. The odds ratio for a change in X from a to b is estimated by raising the odds ratio estimate for a unit change in X to the power of $c=b-a$, as shown previously.

For a polytomous risk factor, the computation of odds ratios depends on how the risk factor is parameterized. For illustration, suppose that Race is a risk factor with four categories: White, Black, Hispanic, and Other.

For the effect parameterization scheme (PARAM=EFFECT) with White as the reference group, the design variables for Race are as follows.

 

Design Variables

Race

$X_1$

$X_2$

$X_3$

Black

1

     0

0

Hispanic

0

1

0

Other

0

0

1

White

–1

–1

–1

The log odds for Black is

$\displaystyle  g(\textrm{Black})  $
$\displaystyle = $
$\displaystyle  \beta _0 + \beta _1 (X_1=1) + \beta _2 (X_2=0) + \beta _3 (X_3=0)  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \beta _0 + \beta _1  $

The log odds for White is

$\displaystyle  g(\textrm{White})  $
$\displaystyle = $
$\displaystyle  \beta _0 + \beta _1 (X_1=-1) + \beta _2 (X_2=-1)+ \beta _3 (X_3=-1))  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \beta _0 - \beta _1 - \beta _2 - \beta _3  $

Therefore, the log odds ratio of Black versus White becomes

$\displaystyle  \log (\psi (\textrm{Black},\textrm{White}))  $
$\displaystyle = $
$\displaystyle  g(\textrm{Black}) - g(\textrm{White})  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  2 \beta _1 + \beta _2 + \beta _3  $

For the reference cell parameterization scheme (PARAM=REF) with White as the reference cell, the design variables for race are as follows.

 

Design Variables

Race

$X_1$

$X_2$

$X_3$

Black

1

      0

0

Hispanic

0

1

0

Other

0

0

1

White

0

0

0

The log odds ratio of Black versus White is given by

$\displaystyle  {\log (\psi (\textrm{Black},\textrm{White}))}  $
$\displaystyle  =  $
$\displaystyle  g(\textrm{Black}) - g(\textrm{White})  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  (\beta _0 + \beta _1 (X_1=1) + \beta _2 (X_2=0)) + \beta _3 (X_3=0)) -  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  (\beta _0 + \beta _1 (X_1=0) + \beta _2 (X_2=0) + \beta _3 (X_3=0))  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \beta _1  $

For the GLM parameterization scheme (PARAM=GLM), the design variables are as follows.

 

Design Variables

Race

$X_1$

$X_2$

$X_3$

$X_4$

Black

1

0

0

0

Hispanic

0

1

0

0

Other

0

0

1

0

White

0

0

0

1

The log odds ratio of Black versus White is

$\displaystyle  {\log (\psi (\textrm{Black},\textrm{White}))}  $
$\displaystyle  =  $
$\displaystyle  g(\textrm{Black}) - g(\textrm{White})  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  (\beta _0 + \beta _1 (X_1=1) + \beta _2 (X_2=0) + \beta _3 (X_3=0) + \beta _4 (X_4=0)) - $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  (\beta _0 + \beta _1 (X_1=0) + \beta _2 (X_2=0) + \beta _3 (X_3=0) + \beta _4(X_4=1))  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \beta _1 - \beta _4  $

Consider the hypothetical example of heart disease among race in Hosmer and Lemeshow (2000, p. 51). The entries in the following contingency table represent counts.

 

Race

Disease Status

White

Black

Hispanic

Other

Present

5

20

15

10

Absent

20

10

10

10

The computation of odds ratio of Black versus White for various parameterization schemes is shown in Table 91.9.

Table 91.9: Odds Ratio of Heart Disease Comparing Black to White

 

Parameter Estimates

 

PARAM=

$\hat{\beta _1}$

$\hat{\beta _2}$

$\hat{\beta _3}$

$\hat{\beta _4}$

Odds Ratio Estimates

EFFECT

0.7651

0.4774

0.0719

 

$\exp (2 \times 0.7651 + 0.4774 + 0.0719) = 8$

REF

2.0794

1.7917

1.3863

 

$\exp (2.0794) = 8 $

GLM

2.0794

1.7917

1.3863

0.0000

$\exp (2.0794) = 8 $


Since the log odds ratio ($\log (\psi )$) is a linear function of the parameters, the Wald confidence interval for $\log (\psi )$ can be derived from the parameter estimates and the estimated covariance matrix. Confidence intervals for the odds ratios are obtained by exponentiating the corresponding confidence intervals for the log odd ratios. In the displayed output of PROC SURVEYLOGISTIC, the Odds Ratio Estimates table contains the odds ratio estimates and the corresponding 95% Wald confidence intervals computed by using the covariance matrix in the section Variance Estimation. For continuous explanatory variables, these odds ratios correspond to a unit increase in the risk factors.

To customize odds ratios for specific units of change for a continuous risk factor, you can use the UNITS statement to specify a list of relevant units for each explanatory variable in the model. Estimates of these customized odds ratios are given in a separate table. Let $(L_ j,U_ j)$ be a confidence interval for $\log (\psi )$. The corresponding lower and upper confidence limits for the customized odds ratio $\exp (c\beta _ j)$ are $\exp (cL_ j)$ and $\exp (cU_ j)$, respectively, (for $c>0$); or $\exp (cU_ j)$ and $\exp (cL_ j)$, respectively, (for c < 0). You use the CLODDS option in the MODEL statement to request confidence intervals for the odds ratios.

For a generalized logit model, odds ratios are computed similarly, except D odds ratios are computed for each effect, corresponding to the D logits in the model.