The POWER Procedure

Analyses in the LOGISTIC Statement

Likelihood Ratio Chi-Square Test for One Predictor (TEST=LRCHI)

The power computing formula is based on Shieh and O’Brien (1998); Shieh (2000); Self, Mauritsen, and Ohara (1992), and Hsieh (1989).

Define the following notation for a logistic regression analysis:

$\displaystyle  N  $
$\displaystyle = \mbox{ \#  subjects \quad (NTOTAL)}  $
$\displaystyle K  $
$\displaystyle = \mbox{ \#  predictors (not counting intercept)}  $
$\displaystyle \mb {x}  $
$\displaystyle = (x_{1}, \ldots , x_{K})’ = \mbox{ random variables for predictor vector }  $
$\displaystyle \mb {x}_{-1}  $
$\displaystyle = (x_{2}, \ldots , x_{K})’  $
$\displaystyle \bmu  $
$\displaystyle = (\mu _{1}, \ldots , \mu _{K})’ = \mr {E} \mb {x} = \mbox{ mean predictor vector }  $
$\displaystyle \mb {x}_ i  $
$\displaystyle = (x_{i1}, \ldots , x_{iK})’ = \mbox{ predictor vector for subject } i \quad (i \in 1, \ldots , N)  $
$\displaystyle Y  $
$\displaystyle = \mbox{ random variable for response (0 or 1)}  $
$\displaystyle Y_ i  $
$\displaystyle = \mbox{ response for subject } i \quad (i \in 1, \ldots , N)  $
$\displaystyle p_ i  $
$\displaystyle = \mr {Prob} (Y_ i = 1 | \mb {x}_ i) \quad (i \in 1, \ldots , N)  $
$\displaystyle \phi  $
$\displaystyle = \mr {Prob} (Y_ i = 1 | \mb {x}_ i = \bmu ) \mbox{ \quad (RESPONSEPROB)}  $
$\displaystyle U_ j  $
$\displaystyle = \mbox{ unit change for $j\mr {th}$ predictor \quad (UNITS)}  $
$\displaystyle \mr {OR}_ j  $
$\displaystyle = \mr {Odds} (Y_ i = 1 | x_{ij} = c) / \mr {Odds} (Y_ i = 1 | x_{ij} = c - U_ j) \quad (c \mbox{ arbitrary}, i \in 1, \ldots , N,  $
$\displaystyle  $
$\displaystyle  \quad j \in 1, \ldots , K) \mbox{ \quad (TESTODDSRATIO if \Mathtext{j} = 1, COVODDSRATIOS if $j > 1$)}  $
$\displaystyle \Psi _0  $
$\displaystyle = \mbox{ intercept in full model \quad (INTERCEPT)}  $
$\displaystyle \bPsi  $
$\displaystyle = (\Psi _1, \ldots , \Psi _ K)’ = \mbox{ regression coefficients in full model}  $
$\displaystyle  $
$\displaystyle  \quad \mbox{ ($\Psi _1$ = TESTREGCOEFF, others = COVREGCOEFFS)}  $
$\displaystyle \rho  $
$\displaystyle = \mr {Corr} (\mb {x}_{-1}, x_{1}) \mbox{ \quad (CORR)}  $
$\displaystyle c_ j  $
$\displaystyle = \mbox{ \#  distinct possible values of $x_{ij}$ \quad ($j \in 1, \ldots , K$) (for any \Mathtext{i}) \quad (NBINS)}  $
$\displaystyle x^\star _{gj}  $
$\displaystyle = \mbox{ $g\mr {th}$ possible value of $x_{ij}$ \quad ($g \in 1, \ldots , c_ j$) ($j \in 1, \ldots , K$)}  $
$\displaystyle  $
$\displaystyle  \quad \mbox{(for any \Mathtext{i}) \quad (VARDIST)}  $
$\displaystyle \pi _{gj}  $
$\displaystyle = \mr {Prob} \left( x_{ij} = x^\star _{gj} \right) \mbox{ \quad ($g \in 1, \ldots , c_ j$) ($j \in 1, \ldots , K$)}  $
$\displaystyle  $
$\displaystyle  \quad \mbox{(for any \Mathtext{i}) \quad (VARDIST)}  $
$\displaystyle C  $
$\displaystyle = \prod _{j=1}^{K} c_ j = \mbox{ \#  possible values of $\mb {x}_ i$ \quad (for any \Mathtext{i})}  $
$\displaystyle \mb {x}^\star _ m  $
$\displaystyle = \mbox{ $m\mr {th}$ possible value of $\mb {x}_ i$ } \quad (m \in 1, \ldots , C)  $
$\displaystyle \pi _ m  $
$\displaystyle = \mr {Prob} \left( \mb {x}_ i = \mb {x}^\star _ m \right) \quad (m \in 1, \ldots , C)  $

The logistic regression model is

\[  \log \left( \frac{p_ i}{1-p_ i} \right) = \Psi _0 + \bPsi ’\mb {x}_ i  \]

The hypothesis test of the first predictor variable is

$\displaystyle  H_{0}\colon  $
$\displaystyle \Psi _1 = 0  $
$\displaystyle H_{1}\colon  $
$\displaystyle \Psi _1 \ne 0  $

Assuming independence among all predictor variables, $\pi _ m$ is defined as follows:

\[  \pi _ m = \prod _{j=1}^{K} \pi _{h(m,j) j} \quad (m \in 1, \ldots , C)  \]

where $h(m,j)$ is calculated according to the following algorithm:

$\displaystyle  \lefteqn{z = m;}  $
$\displaystyle \lefteqn{\mr {do} \quad j = K \quad \mr {to} \quad 1;}  $
$\displaystyle \lefteqn{\quad h(m,j) = \mr {mod}(z-1, c_ j) + 1;}  $
$\displaystyle \lefteqn{\quad z = \mr {floor}((z-1) / c_ j) + 1;}  $
$\displaystyle \lefteqn{\mr {end};}  $

This algorithm causes the elements of the transposed vector $\{ h(m,1), \ldots , h(m,K) \} $ to vary fastest to slowest from right to left as m increases, as shown in the following table of $h(m,j)$ values:

\[  \begin{array}{cc|ccccc}& &  \multicolumn{5}{c}{j} \\ \multicolumn{2}{c|}{h(m,j)} &  1 &  2 &  \cdots &  K-1 &  K \\ \hline &  1 &  1 &  1 &  \cdots &  1 &  1 \\ &  1 &  1 &  1 &  \cdots &  1 &  2 \\ &  \vdots &  \multicolumn{5}{c}{\vdots } \\ &  \vdots &  1 &  1 &  \cdots &  1 &  c_ K \\ &  \vdots &  1 &  1 &  \cdots &  2 &  1 \\ &  \vdots &  1 &  1 &  \cdots &  2 &  2 \\ &  \vdots &  \multicolumn{5}{c}{\vdots } \\ m &  \vdots &  1 &  1 &  \cdots &  2 &  c_ K \\ &  \vdots &  \multicolumn{5}{c}{\vdots } \\ &  \vdots &  c_1 &  c_2 &  \cdots &  c_{K-1} &  1 \\ &  \vdots &  c_1 &  c_2 &  \cdots &  c_{K-1} &  2 \\ &  \vdots &  \multicolumn{5}{c}{\vdots } \\ &  C &  c_1 &  c_2 &  \cdots &  c_{K-1} &  c_ K \\ \end{array}  \]

The $\mb {x}^\star _ m$ values are determined in a completely analogous manner.

The discretization is handled as follows (unless the distribution is ordinal, or binomial with sample size parameter at least as large as requested number of bins): for $x_ j$, generate $c_ j$ quantiles at evenly spaced probability values such that each such quantile is at the midpoint of a bin with probability $\frac{1}{c_ j}$. In other words,

$\displaystyle  x^\star _{gj}  $
$\displaystyle = \left( \frac{g - 0.5}{c_ j} \right)\mr {th} \,  \mbox{ quantile of relevant distribution},  $
$\displaystyle  $
$\displaystyle  \quad (g \in 1, \ldots , c_ j) (j \in 1, \ldots , K)  $
$\displaystyle \pi _{gj}  $
$\displaystyle = \frac{1}{c_ j} \quad \mbox{(same for all \Mathtext{g})}  $

The primary noncentrality for the power computation is

\[  \Delta ^\star = 2 \sum _{m=1}^ C \pi _ m \left[ b’(\theta _ m) \left(\theta _ m - \theta ^\star _ m \right) - \left( b(\theta _ m) - b(\theta ^\star _ m) \right) \right]  \]

where

$\displaystyle  b’(\theta )  $
$\displaystyle = \frac{\exp (\theta )}{1 + \exp (\theta )}  $
$\displaystyle b(\theta )  $
$\displaystyle = \log \left( 1 + \exp (\theta ) \right)  $
$\displaystyle \theta _ m  $
$\displaystyle = \Psi _0 + \bPsi ’\mb {x}^\star _ m  $
$\displaystyle \theta ^\star _ m  $
$\displaystyle = \Psi ^\star _0 + \bPsi ^{\star \prime } \mb {x}^\star _ m  $

where

$\displaystyle  \Psi ^\star _0  $
$\displaystyle = \Psi _0 + \Psi _1 \mu _1 = \mbox{ intercept in reduced model, absorbing the tested predictor} $
$\displaystyle \bPsi ^\star  $
$\displaystyle = (0, \Psi _2, \ldots , \Psi _ K)’ = \mbox{ coefficients in reduced model}  $

The power is

\[  \mr {power} = P\left(\chi ^2(1, \Delta ^\star N (1-\rho ^2)) \ge \chi ^2_{1-\alpha }(1)\right)  \]

The factor $(1-\rho ^2)$ is the adjustment for correlation between the predictor that is being tested and other predictors, from Hsieh (1989).

Alternative input parameterizations are handled by the following transformations:

$\displaystyle  \Psi _0  $
$\displaystyle = \log \left( \frac{\phi }{1-\phi } \right) - \bPsi ’\bmu  $
$\displaystyle \Psi _ j  $
$\displaystyle = \frac{\log (\mr {OR}_ j)}{U_ j} \quad (j \in 1, \ldots , K)  $