The FREQ Procedure

Odds Ratio and Relative Risks for 2 x 2 Tables

Odds Ratio

The odds ratio is a useful measure of association for a variety of study designs. For a retrospective design called a case-control study, the odds ratio can be used to estimate the relative risk when the probability of positive response is small (Agresti, 2002). In a case-control study, two independent samples are identified based on a binary (yes-no) response variable, and the conditional distribution of a binary explanatory variable is examined, within fixed levels of the response variable. See Stokes, Davis, and Koch (2012) and Agresti (2007).

The odds of a positive response (column 1) in row 1 is $n_{11} / n_{12}$ . Similarly, the odds of a positive response in row 2 is $n_{21} / n_{22}$ . The odds ratio is formed as the ratio of the row 1 odds to the row 2 odds. The odds ratio for a $2 \times 2$ table is defined as

$\mathit{OR} = \frac{n_{11}/n_{12}}{n_{21}/n_{22}} = \frac{n_{11} ~ n_{22}}{n_{12} ~ n_{21}}$

The odds ratio can be any nonnegative number. When the row and column variables are independent, the true value of the odds ratio equals 1. An odds ratio greater than 1 indicates that the odds of a positive response are higher in row 1 than in row 2. Values less than 1 indicate the odds of positive response are higher in row 2. The strength of association increases with the deviation from 1.

The transformation $G = (\mathit{OR}-1)/(\mathit{OR}+1)$ transforms the odds ratio to the range (–1,1) with G = 0 when $\mathit{OR} = 1$ ; G = –1 when $\mathit{OR} = 0$ ; and G approaches 1 as OR approaches infinity. G is the gamma statistic, which PROC FREQ computes when you specify the MEASURES option.

The asymptotic $100(1-\alpha )$ % confidence limits for the odds ratio are

$\left( ~ \mathit{OR} \times \exp ( -z \sqrt {v} ), ~ ~ \mathit{OR} \times \exp ( z \sqrt {v} ) ~ \right)$

where

$v = \mr{Var} (\ln \mathit{OR}) = \frac{1}{n_{11}} + \frac{1}{n_{12}} + \frac{1}{n_{21}} + \frac{1}{n_{22}}$

and z is the $100(1-\alpha /2)$ percentile of the standard normal distribution. If any of the four cell frequencies are zero, the estimates are not computed.

Score Confidence Limits for the Odds Ratio

Score confidence limits for the odds ratio (Miettinen and Nurminen, 1985) are computed by inverting score tests for the odds ratio. A score-based chi-square test statistic for the null hypothesis that the odds ratio equals $\theta$ can be expressed as

$Q(\theta ) = \{ n_{1 \cdot } \left( \hat{p}_1 - \tilde{p}_1 \right) \} ^2 ~ / ~ \{ n / (n-1) \} ~ \{ 1 / \left( n_{1 \cdot } \tilde{p}_1 ( 1 - \tilde{p}_1 ) \right) + 1 / \left( n_{2 \cdot } \tilde{p}_2 ( 1 - \tilde{p}_2 ) \right) \} ^{-1}$

where $\hat{p}_1$ is the observed row 1 risk (proportion), and $\tilde{p}_1$ and $\tilde{p}_2$ are the maximum likelihood estimates of the row 1 and row 2 risks under the restriction that the odds ratio ( $n_{11} n_{22} / n_{12} n_{21}$ ) is $\theta$ . For more information, see Miettinen and Nurminen (1985) and Miettinen (1985, chapter 14).

The $100(1-\alpha )$ % score confidence interval for the odds ratio consists of all values of $\theta$ for which the test statistic $Q(\theta )$ falls in the acceptance region,

$\{ \theta : Q(\theta ) < \chi ^2_{1, \alpha } \}$

where $\chi ^2_{1, \alpha }$ is the 100 $(1-\alpha )$ percentile of the chi-square distribution with one degree of freedom. PROC FREQ finds the confidence limits by iterative computation. For more information about score confidence limits, see Agresti (2013).

By default, the score confidence limits include the bias correction factor $n/(n-1)$ in the denominator of $Q(\theta )$ (Miettinen and Nurminen, 1985, p. 217). If you specify the CL=SCORE(CORRECT=NO) option, PROC FREQ does not include this factor in the computation.

The maximum likelihood estimates of $p_1$ and $p_2$ , subject to the constraint that the odds ratio is $\theta$ , are computed as

$\tilde{p_2} = \left( -b + \sqrt { b^2 - 4 a c } \right) / 2a \hspace{.15in} \mr{and} \hspace{.15in} \tilde{p_1} = \tilde{p}_2 \theta / \left( 1 + \tilde{p}_2 (\theta - 1) \right)$

where

$\begin{eqnarray*} a & = & n_{2 \cdot } (\theta - 1 ) \\ b & = & n_{1 \cdot } \theta + n_{2 \cdot } - \hat{p}_{\cdot 1} (\theta - 1) \\ c & = & - \hat{p}_{\cdot 1} \end{eqnarray*}$

For more information, see Miettinen and Nurminen (1985, pp. 217–218) and Miettinen (1985, chapter 14).

Exact Confidence Limits for the Odds Ratio

When you specify the OR option in the EXACT statement, PROC FREQ computes exact confidence limits for the odds ratio. Because this is a discrete problem, the confidence coefficient for the exact confidence interval is not exactly $(1-\alpha )$ but is at least $(1-\alpha )$ . Thus, these confidence limits are conservative. See Agresti (1992) for more information.

PROC FREQ computes exact confidence limits for the odds ratio by using an algorithm based on Thomas (1971). See also Gart (1971). The following two equations are solved iteratively to determine the lower and upper confidence limits, $\phi _1$ and $\phi _2$ :

$\begin{eqnarray*} \sum _{i=n_{11}}^{n_{\cdot 1}} \binom {n_{1 \cdot }}{i} \binom {n_{2 \cdot }}{n_{\cdot 1} - i} ~ \phi _1^ i ~ ~ / ~ ~ \sum _{i=0}^{n_{\cdot 1}} \binom {n_{1 \cdot }}{i} \binom {n_{2 \cdot }}{n_{\cdot 1}-i} ~ \phi _1^ i ~ & = & ~ \alpha /2 \\[0.10in] \sum _{i=0}^{n_{11}} \binom {n_{1 \cdot }}{i} \binom {n_{2 \cdot }}{n_{\cdot 1} - i} ~ \phi _2^ i ~ ~ / ~ ~ \sum _{i=0}^{n_{\cdot 1}} \binom {n_{1 \cdot }}{i} \binom {n_{2 \cdot }}{n_{\cdot 1} - i} ~ \phi _2^ i ~ & = & ~ \alpha /2 \end{eqnarray*}$

When the odds ratio equals zero, which occurs when either $n_{11} = 0$ or $n_{22} = 0$ , PROC FREQ sets the lower exact confidence limit to zero and determines the upper limit with level $\alpha$ . Similarly, when the odds ratio equals infinity, which occurs when either $n_{12} = 0$ or $n_{21} = 0$ , PROC FREQ sets the upper exact confidence limit to infinity and determines the lower limit with level $\alpha$ .

Relative Risks

These measures of relative risk are useful in cohort (prospective) study designs, where two samples are identified based on the presence or absence of an explanatory factor. The two samples are observed in future time for the binary (yes-no) response variable under study. Relative risk measures are also useful in cross-sectional studies, where two variables are observed simultaneously. See Stokes, Davis, and Koch (2012) and Agresti (2007) for more information.

The column 1 relative risk is the ratio of the column 1 risk for row 1 to row 2. The column 1 risk for row 1 is the proportion of the row 1 observations classified in column 1,

$p_1 = n_{11} ~ / ~ n_{1 \cdot }$

Similarly, the column 1 risk for row 2 is

$p_2 = n_{21} ~ / ~ n_{2 \cdot }$

The column 1 relative risk is computed as

$\mathit{RR}_1 = p_1 ~ / ~ p_2$

A relative risk greater than 1 indicates that the probability of positive response is greater in row 1 than in row 2. Similarly, a relative risk less than 1 indicates that the probability of positive response is less in row 1 than in row 2. The strength of association increases with the deviation from 1.

Asymptotic $100(1-\alpha )$ % confidence limits for the column 1 relative risk are computed as

$\left( ~ \mathit{RR}_1 \times \exp ( -z \sqrt {v} ) , ~ ~ \mathit{RR}_1 \times \exp ( z \sqrt {v} ) ~ \right)$

where

$v = \mr{Var} (\ln \mathit{RR}_1) = \bigl ( (1-p_1) / n_{11} \bigr ) ~ + ~ \bigl ( (1-p_2) / n_{21} \bigr )$

and z is the $100(1-\alpha /2)$ percentile of the standard normal distribution. If either $n_{11}$ or $n_{21}$ is zero, the estimates are not computed.

PROC FREQ computes the column 2 relative risks in the same way.

Score Confidence Limits for the Relative Risk

Score confidence limits for the relative risk (Miettinen and Nurminen, 1985; Farrington and Manning, 1990) are computed by inverting score tests for the relative risk. A score-based chi-square test statistic for the null hypothesis that the relative risk equals $\mathit{R_0}$ can be expressed as

$Q(\mathit{R_0}) = ( \hat{p_1} - \mathit{R_0} \hat{p_2} )^2 ~ / ~ \widetilde{\mr{Var}}(\mathit{R_0})$

where $\hat{p}_1$ and $\hat{p}_2$ are the observed row 1 and row 2 risks (proportions), respectively,

$\widetilde{\mr{Var}}(\mathit{R_0}) = \left( n / (n-1) \right) ~ \left( ~ \tilde{p}_1 (1-\tilde{p}_1) / n_{1 \cdot } ~ +~ \mathit{R_0}^2 ~ \tilde{p}_2 (1-\tilde{p}_2) / n_{2 \cdot } ~ \right)$

where $\tilde{p}_1$ and $\tilde{p}_2$ are the maximum likelihood estimates of $p_1$ and $p_2$ , respectively, under the null hypothesis that the relative risk equals $\mathit{R_0}$ . For more information, see Miettinen and Nurminen (1985) and Miettinen (1985, chapter 13).

The $100(1-\alpha )$ % score confidence interval for the relative risk consists of all values of $\mathit{R_0}$ for which the test statistic $Q(\mathit{R_0})$ falls in the acceptance region,

$\{ R_0: Q(\mathit{R_0}) < \chi ^2_{1, \alpha } \} \\$

By default, the score confidence limits include the bias correction factor $n/(n-1)$ in the denominator of $Q(\mathit{R_0})$ (Miettinen and Nurminen, 1985, p. 217). If you specify the CL=SCORE(CORRECT=NO) option, PROC FREQ does not include this factor in the computation.

The maximum likelihood estimates of $p_1$ and $p_2$ , subject to the constraint that the relative risk is $\mathit{R_0}$ , are computed as

$\tilde{p}_1 = \left( -b - \sqrt {b^2 - 4ac} \right) / 2a \hspace{.15in} \mr{and} \hspace{.15in} \tilde{p}_2 = \tilde{p}_1 / \mathit{R_0}$

where

$\begin{eqnarray*} a & = & 1 + \theta \\ b & = & - \left( \mathit{R_0} ( 1 + \theta \hat{p}_2 ) + \theta + \hat{p}_1 \right) \\ c & = & \mathit{R_0} ( \hat{p}_1 + \theta \hat{p}_2 ) \\ \theta & = & n_{2 \cdot } / n_{1 \cdot } \end{eqnarray*}$

For more information, see Farrington and Manning (1990, p. 1454) and Miettinen and Nurminen (1985, p. 217).

Exact Unconditional Confidence Limits for the Relative Risk

If you specify the RELRISK option in the EXACT statement, PROC FREQ provides exact unconditional confidence limits for the relative risk. PROC FREQ computes the confidence limits by inverting two separate one-sided tests (tail method), where the size of each test is at most $\alpha /2$ and the confidence coefficient is at least $(1-\alpha )$ . Exact conditional methods, described in the section Exact Statistics, do not apply to the relative risk due to the presence of a nuisance parameter (Agresti, 1992). The unconditional approach eliminates the nuisance parameter by maximizing the p-value over all possible values of the parameter (Santner and Snell, 1980).

By default, PROC FREQ uses the unstandardized relative risk as the test statistic in the confidence limit computations. If you specify the RELRISK(METHOD=SCORE) option, the procedure uses the relative risk score statistic (Chan and Zhang, 1999). The score statistic is a less discrete statistic than the raw relative risk and produces less conservative confidence limits (Agresti and Min, 2001). See also Santner et al. (2007) for comparisons of methods for computing exact confidence limits.

See the section Exact Unconditional Confidence Limits for the Risk Difference for a description of the method that PROC FREQ uses to compute confidence limits for the relative risk. The test statistic for the relative risk computation is either the unstandardized relative risk (by default) or the relative risk score statistic (if you specify the RELRISK(METHOD=SCORE) option). PROC FREQ uses the following form of the unstandardized relative risk, which adds 0.05 to each frequency, to ensure that the statistic is defined when there are zero table cells (Gart and Nam, 1988):

$\mathit{rr} = \frac{ (n_{11} + 0.5) ~ / ~ ( n_{1 \cdot } + 0.5 ) }{ (n_{21} + 0.5) ~ / ~ ( n_{2 \cdot } + 0.5 ) }$

If you specify the RELRISK(METHOD=SCORE) option, PROC FREQ uses the relative risk score statistic (Miettinen and Nurminen, 1985; Farrington and Manning, 1990). This test statistic is computed as

$z = ( \hat{p_1} - \mathit{R_0} \hat{p_2} ) ~ / ~ \mr{se}(\mathit{rr})$

where

$\mr{se}(\mathit{rr}) = \sqrt { \tilde{p}_1 (1-\tilde{p}_1) / n_{1 \cdot } ~ +~ \mathit{R_0}^2 ~ \tilde{p}_2 (1-\tilde{p}_2) / n_{2 \cdot } }$

where $\tilde{p}_1$ and $\tilde{p}_2$ are the maximum likelihood estimates of $p_1$ and $p_2$ under the null hypothesis that the relative risk equals $\mathit{R_0}$ . The maximum likelihood solution is

$\tilde{p}_1 = ( -b - \sqrt {b^2 - 4ac} ) / 2a \hspace{.15in} \mr{and} \hspace{.15in} \tilde{p}_2 = \tilde{p}_1 / \mathit{R_0}$

where

For more information, see Farrington and Manning (1990, p. 1454) and Miettinen and Nurminen (1985, p. 217).