The POWER Procedure

Analyses in the LOGISTIC Statement

Likelihood Ratio Chi-Square Test for One Predictor (TEST=LRCHI)

The power computing formula is based on Shieh and O’Brien (1998); Shieh (2000); Self, Mauritsen, and Ohara (1992), and Hsieh (1989).

Define the following notation for a logistic regression analysis:

$\displaystyle N$	$\displaystyle = \mbox{ \# subjects \quad (NTOTAL)}$
$\displaystyle K$	$\displaystyle = \mbox{ \# predictors (not counting intercept)}$
$\displaystyle \mb {x}$	$\displaystyle = (x_{1}, \ldots , x_{K})’ = \mbox{ random variables for predictor vector }$
$\displaystyle \mb {x}_{-1}$	$\displaystyle = (x_{2}, \ldots , x_{K})’$
$\displaystyle \bmu$	$\displaystyle = (\mu _{1}, \ldots , \mu _{K})’ = \mr {E} \mb {x} = \mbox{ mean predictor vector }$
$\displaystyle \mb {x}_ i$	$\displaystyle = (x_{i1}, \ldots , x_{iK})’ = \mbox{ predictor vector for subject } i \quad (i \in 1, \ldots , N)$
$\displaystyle Y$	$\displaystyle = \mbox{ random variable for response (0 or 1)}$
$\displaystyle Y_ i$	$\displaystyle = \mbox{ response for subject } i \quad (i \in 1, \ldots , N)$
$\displaystyle p_ i$	$\displaystyle = \mr {Prob} (Y_ i = 1 \| \mb {x}_ i) \quad (i \in 1, \ldots , N)$
$\displaystyle \phi$	$\displaystyle = \mr {Prob} (Y_ i = 1 \| \mb {x}_ i = \bmu ) \mbox{ \quad (RESPONSEPROB)}$
$\displaystyle U_ j$	$\displaystyle = \mbox{ unit change for $j\mr {th}$ predictor \quad (UNITS)}$
$\displaystyle \mr {OR}_ j$	$\displaystyle = \mr {Odds} (Y_ i = 1 \| x_{ij} = c) / \mr {Odds} (Y_ i = 1 \| x_{ij} = c - U_ j) \quad (c \mbox{ arbitrary}, i \in 1, \ldots , N,$
$\displaystyle$	$\displaystyle \quad j \in 1, \ldots , K) \mbox{ \quad (TESTODDSRATIO if \Mathtext{j} = 1, COVODDSRATIOS if $j > 1$)}$
$\displaystyle \Psi _0$	$\displaystyle = \mbox{ intercept in full model \quad (INTERCEPT)}$
$\displaystyle \bPsi$	$\displaystyle = (\Psi _1, \ldots , \Psi _ K)’ = \mbox{ regression coefficients in full model}$
$\displaystyle$	$\displaystyle \quad \mbox{ ($\Psi _1$ = TESTREGCOEFF, others = COVREGCOEFFS)}$
$\displaystyle \rho$	$\displaystyle = \mr {Corr} (\mb {x}_{-1}, x_{1}) \mbox{ \quad (CORR)}$
$\displaystyle c_ j$	$\displaystyle = \mbox{ \# distinct possible values of $x_{ij}$ \quad ($j \in 1, \ldots , K$) (for any \Mathtext{i}) \quad (NBINS)}$
$\displaystyle x^\star _{gj}$	$\displaystyle = \mbox{ $g\mr {th}$ possible value of $x_{ij}$ \quad ($g \in 1, \ldots , c_ j$) ($j \in 1, \ldots , K$)}$
$\displaystyle$	$\displaystyle \quad \mbox{(for any \Mathtext{i}) \quad (VARDIST)}$
$\displaystyle \pi _{gj}$	$\displaystyle = \mr {Prob} \left( x_{ij} = x^\star _{gj} \right) \mbox{ \quad ($g \in 1, \ldots , c_ j$) ($j \in 1, \ldots , K$)}$
$\displaystyle$	$\displaystyle \quad \mbox{(for any \Mathtext{i}) \quad (VARDIST)}$
$\displaystyle C$	$\displaystyle = \prod _{j=1}^{K} c_ j = \mbox{ \# possible values of $\mb {x}_ i$ \quad (for any \Mathtext{i})}$
$\displaystyle \mb {x}^\star _ m$	$\displaystyle = \mbox{ $m\mr {th}$ possible value of $\mb {x}_ i$ } \quad (m \in 1, \ldots , C)$
$\displaystyle \pi _ m$	$\displaystyle = \mr {Prob} \left( \mb {x}_ i = \mb {x}^\star _ m \right) \quad (m \in 1, \ldots , C)$

The logistic regression model is

$\log \left( \frac{p_ i}{1-p_ i} \right) = \Psi _0 + \bPsi ’\mb {x}_ i$

The hypothesis test of the first predictor variable is

$\displaystyle H_{0}\colon$	$\displaystyle \Psi _1 = 0$
$\displaystyle H_{1}\colon$	$\displaystyle \Psi _1 \ne 0$

Assuming independence among all predictor variables, $\pi _ m$ is defined as follows:

$\pi _ m = \prod _{j=1}^{K} \pi _{h(m,j) j} \quad (m \in 1, \ldots , C)$

where is calculated according to the following algorithm:

$\displaystyle \lefteqn{z = m;}$

$\displaystyle \lefteqn{\mr {do} \quad j = K \quad \mr {to} \quad 1;}$

$\displaystyle \lefteqn{\quad h(m,j) = \mr {mod}(z-1, c_ j) + 1;}$

$\displaystyle \lefteqn{\quad z = \mr {floor}((z-1) / c_ j) + 1;}$

$\displaystyle \lefteqn{\mr {end};}$

This algorithm causes the elements of the transposed vector $\{ h(m,1), \ldots , h(m,K) \}$ to vary fastest to slowest from right to left as m increases, as shown in the following table of values:

$\begin{array}{cc|ccccc}& & \multicolumn{5}{c}{j} \\ \multicolumn{2}{c|}{h(m,j)} & 1 & 2 & \cdots & K-1 & K \\ \hline & 1 & 1 & 1 & \cdots & 1 & 1 \\ & 1 & 1 & 1 & \cdots & 1 & 2 \\ & \vdots & \multicolumn{5}{c}{\vdots } \\ & \vdots & 1 & 1 & \cdots & 1 & c_ K \\ & \vdots & 1 & 1 & \cdots & 2 & 1 \\ & \vdots & 1 & 1 & \cdots & 2 & 2 \\ & \vdots & \multicolumn{5}{c}{\vdots } \\ m & \vdots & 1 & 1 & \cdots & 2 & c_ K \\ & \vdots & \multicolumn{5}{c}{\vdots } \\ & \vdots & c_1 & c_2 & \cdots & c_{K-1} & 1 \\ & \vdots & c_1 & c_2 & \cdots & c_{K-1} & 2 \\ & \vdots & \multicolumn{5}{c}{\vdots } \\ & C & c_1 & c_2 & \cdots & c_{K-1} & c_ K \\ \end{array}$

The $\mb {x}^\star _ m$ values are determined in a completely analogous manner.

The discretization is handled as follows (unless the distribution is ordinal, or binomial with sample size parameter at least as large as requested number of bins): for , generate quantiles at evenly spaced probability values such that each such quantile is at the midpoint of a bin with probability $\frac{1}{c_ j}$ . In other words,

$\displaystyle x^\star _{gj}$	$\displaystyle = \left( \frac{g - 0.5}{c_ j} \right)\mr {th} \, \mbox{ quantile of relevant distribution},$
$\displaystyle$	$\displaystyle \quad (g \in 1, \ldots , c_ j) (j \in 1, \ldots , K)$
$\displaystyle \pi _{gj}$	$\displaystyle = \frac{1}{c_ j} \quad \mbox{(same for all \Mathtext{g})}$

The primary noncentrality for the power computation is

$\Delta ^\star = 2 \sum _{m=1}^ C \pi _ m \left[ b’(\theta _ m) \left(\theta _ m - \theta ^\star _ m \right) - \left( b(\theta _ m) - b(\theta ^\star _ m) \right) \right]$

where

$\displaystyle b’(\theta )$	$\displaystyle = \frac{\exp (\theta )}{1 + \exp (\theta )}$
$\displaystyle b(\theta )$	$\displaystyle = \log \left( 1 + \exp (\theta ) \right)$
$\displaystyle \theta _ m$	$\displaystyle = \Psi _0 + \bPsi ’\mb {x}^\star _ m$
$\displaystyle \theta ^\star _ m$	$\displaystyle = \Psi ^\star _0 + \bPsi ^{\star \prime } \mb {x}^\star _ m$

where

$\displaystyle \Psi ^\star _0$	$\displaystyle = \Psi _0 + \Psi _1 \mu _1 = \mbox{ intercept in reduced model, absorbing the tested predictor}$
$\displaystyle \bPsi ^\star$	$\displaystyle = (0, \Psi _2, \ldots , \Psi _ K)’ = \mbox{ coefficients in reduced model}$

The power is

$\mr {power} = P\left(\chi ^2(1, \Delta ^\star N (1-\rho ^2)) \ge \chi ^2_{1-\alpha }(1)\right)$

The factor $(1-\rho ^2)$ is the adjustment for correlation between the predictor that is being tested and other predictors, from Hsieh (1989).

Alternative input parameterizations are handled by the following transformations:

$\displaystyle \Psi _0$	$\displaystyle = \log \left( \frac{\phi }{1-\phi } \right) - \bPsi ’\bmu$
$\displaystyle \Psi _ j$	$\displaystyle = \frac{\log (\mr {OR}_ j)}{U_ j} \quad (j \in 1, \ldots , K)$