Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The LOGISTIC Procedure

Exact Conditional Analysis

Asymptotic methods may be inadequate when sample sizes are small or the data are sparse or skewed. In such cases, the asymptotic p-values are not close approximations to the true p-values. Exact conditional inference remains valid in such situations.

The following sections summarize the unconditional and conditional analyses, discuss the network algorithm used in the computations and the necessary computational resources, and provide details on the statistics displayed in the output tables.

Background

The theory of exact conditional logistic regression analysis was originally laid out by Cox (1970). Other useful references include Cox and Snell (1989), Agresti (1990, 1992), and Mehta and Patel (1995).

Consider N independent Bernoulli random variables Y1,...,YN having observed values y0 = (y01,...,y0N)'. For each observation i=1,...,N, let xi = (xi1,...,xim,xi,m+1,...,xi,m+n)' be an m+n=s vector of explanatory variables, and denote X = (x1,...,xN)'. Let p_i=p(x_i)=\Pr(Y_i=1 | x_i) be the event probability for each i=1,...,N, and denote p = (p1,...,pN)'. Then the logistic regression model is {logit}(p)=X{\beta}, or

{logit}(p_i)=\log(\frac{p_i}{1-p_i})=x_i'{\beta}
where {\beta}=(\beta_1,...,\beta_{s})' is the unknown parameter vector.

Unconditional likelihood inference is based on maximizing the likelihood function:

L({\beta})=\prod_{i=1}^N p_i^{y_{0i}}(1-p_i)^{1-y_{0i}}=\frac{\exp(y_0'X{\beta})}{\prod_{i=1}^N[1+\exp(x_i'{\beta})]}

To perform conditional inference, first observe that the sufficient statistics for the \beta_j in the unconditional likelihood function are the corresponding T_j=\sum_{i=1}^N y_i x_{ij},where yi is a realization of Yi. To create the probability density function (pdf) for T = (T1,...,Ts)', sum over all binary sequences y that generate an observable t:

\Pr(T=t)=\frac{C(t)\exp(t'{\beta})}{\prod_{i=1}^N[1+\exp(x_i'{\beta})]}
where C(t)=||\{y: y'X=t'\}|| is the number of sequences y that generate t. Suppose the m parameters {\beta}_0=(\beta_{1},...,\beta_{m})' are nuisance parameters; that is, the current analysis is geared toward the last n parameters {\beta}_1=(\beta_{m+1},...,\beta_{m+n})'. Denote the sufficient statistics for the nuisance parameters as T0 = (T1,...,Tm)', the corresponding observed values as t0, and the corresponding columns of X as X0. Similarly, define T1, t1, and X1 for the parameters of interest. The nuisance parameters can be removed from the analysis by conditioning on their sufficient statistics to create the conditional likelihood
\Pr(T_1=t_1|{T}_0=t_0)=\frac{\Pr(T=t)}{\Pr(T_0=t_0)}=\frac{C(t)\exp(t_1'{\beta}_1)} {\sum_{u}C(u,t_0)\exp(u'{\beta}_1)}
where C(u,t0) is the number of vectors y such that y'X1 = u and y'X0 = t0.

Conditional asymptotic inference is performed by maximizing the conditional likelihood and producing conditional statistics similar to those for the unconditional likelihood case.

Conditional exact inference is based on generating the conditional distribution for the parameters of interest. This distribution is called the permutation or exact conditional distribution. The conditional pdf \Pr(T_1=t_1|{T}_0=t_0) is denoted as f_{\mathbf{\beta}_1}(t_1|{t}_0).

Computational Algorithm

The goal of the exact conditional analysis is to determine how likely the observed response y0 is with respect to all 2N possible responses y = (y1,...,yN)'. One way to proceed is to generate every y vector for which y'X0 = t0 and count the number of vectors y for which y'X1 is equal to each unique t1. Generating the conditional distribution from complete enumeration of the joint distribution is conceptually simple; however, this method becomes computationally infeasible very quickly. For example, if you had only 30 observations, you'd have to scan through 230 different y vectors.

PROC LOGISTIC employs a network algorithm developed by Hirji, Mehta, and Patel (1987) to generate and count the y vectors. The algorithm is based on the following observation: given any y = (y1,...,yN)' and a design X = (x1,...,xN)', let y(i) = (y1,...,yi)' and X_{(i)}=(x_{1},...,x_{i})'=(x_{1,1} & ... & x_{1,s}\ \vdots & &\vdots \ x_{i,1} & ... & x_{i,s}) be the first i rows of each matrix. Write the sufficient statistic based on these i rows as t'(i) = y(i)'X(i). A recursion relation results: t(i+1) = t(i) + yi+1xi+1. Combining this relation with a method of determining which y vectors produce the t0 margins makes the generation of the permutation distribution feasible.

The bulk of the computation time and memory is consumed by the creation of the exact joint distribution. After the joint distribution for a set of effects is created, the computational effort required to produce hypothesis tests and parameter estimates for any subset of the effects is (relatively) trivial.

Computational Resources

PROC LOGISTIC uses a relatively fast and efficient algorithm for the exact analyses. This recently developed algorithm, together with improvements in computer power, now make it feasible to perform exact computations for data sets where previously only asymptotic methods could be applied. Nevertheless, there are still large problems that may require a prohibitive amount of time and memory for exact computations, depending on the speed and memory available on your computer. For large problems, consider whether exact methods are really needed or whether asymptotic methods might give results quite close to the exact results, while requiring much less computer time and memory.

A formula does not exist that can predict the amount of time and memory necessary to generate the permutation distributions for a particular problem. The time and memory required depend on several factors, including the total sample size, the number of parameters of interest, the number of nuisance parameters, and the order in which the parameters are processed. If you run out of memory, refer to the SAS Companion for your system to see how to allocate more.

You can use the MAXTIME= option in the EXACTOPTIONS option to limit the total amount of time PROC LOGISTIC uses to derive all of the exact distributions. If PROC LOGISTIC does not finish within that time, the procedure terminates. If you need to derive several distributions, it may be more feasible to request one distribution at a time.

At any time while PROC LOGISTIC is deriving the distributions, you can terminate the computations by pressing the system interrupt key sequence (refer to the SAS Companion for your system) and choosing to stop computations.

Hypothesis Tests

Consider testing the null hypothesis H_0\colon{\beta}_1=0 against the alternative H_A\colon{\beta}_1\neq{0}, conditional on T0 = t0. Under the null hypothesis, the test statistic for the exact probability test is just f_{\mathbf{\beta}_1=0}(t_1|{t}_0), while the corresponding p-value is the probability of getting a less likely (more extreme) statistic,
p(t_1|{t}_0)=\sum_{u\in\Omega_p} f_{0}(u|{t}_0)
where \Omega_p=\{u\colon for some y, y'X1 = u, y'X0 = t0, and f_{0}(u|{t}_0) \le f_{0}(t_1|{t}_0)\}.

For the exact conditional scores test, the conditional mean {\mu}_1 and variance matrix {{\Sigma}}_1 of the T1 (conditional on T0 = t0) are calculated, and the score statistic for the observed value,

s=(t_1-{\mu}_1)'{{\Sigma}}_1^{-1}(t_1-{\mu}_1)
is compared to the score for each member of the distribution
S(T_1)=(T_1-{\mu}_1)'{{\Sigma}}_1^{-1}(T_1-{\mu}_1)
The resulting p-value is
p(t_1|{t}_0)=Pr(S\ge s)=\sum_{u\in\Omega_s} f_{0}(u|{t}_0)
where \Omega_s=\{u\colon for some y, y'X1 = u, y'X0 = t0, and S(u) \ge s\}.

The mid-p statistic was originally proposed by Lancaster (1961) to compensate for the discreteness of the distribution. Hirji, Tsiatis, and Mehta (1989) recommend its use with small or sparse data sets; Vollset, Hirji, and Afifi (1991) suggest the statistic for matched case-control studies; and Hirji and Tang (1998) recommend the statistic for tests of trend. The mid-p is defined as

\frac{1}2f_{\mathbf{\beta}_1}(t_1|{t}_0)+\sum_{u\in\Omega} f_{\mathbf{\beta}_1}(u|{t}_0)
where \Omega is either \Omega_s or \Omega_p except that strict inequalities are used.

Inference for a Single Parameter

Exact parameter estimates are derived for a single parameter \beta_i by regarding all the other parameters {\beta}_0=(\beta_1,...,\beta_{i-1},\beta_{i+1},...,\beta_{s})' as nuisance parameters. The appropriate sufficient statistics are T1 = Ti and T0 = (T1,...,Ti-1,Ti+1,...,Ts)', with their observed values denoted by the lowercase t. Hence, the conditional pdf used to create the parameter estimate for \beta_iis
f_{\beta_i}(t_i|{t}_0)=\frac{C(t_i,t_0)\exp(t_i\beta_i)} {\sum_{u\in\Omega}C(u,t_0)\exp(u\beta_i)}
for \Omega=\{u\colon {for some}y, T_i=u {and}T_0=t_0\}.

The maximum exact conditional likelihood estimate is the quantity \hat\beta_i which maximizes the conditional pdf. A Newton-Raphson algorithm is used to perform this search. However, if the observed ti attains either its minimum or maximum value in the permutation distribution (that is, either t_i=\min\{u:u\in\Omega\} or t_i=\max\{u:u\in\Omega\}), then the conditional pdf is monotonically increasing in \beta_i and cannot be maximized. In this case, a median unbiased estimate (Hirji, Tsiatis, and Mehta 1989; Hirji and Tang 1998) \hat\beta_i is produced that satisfies \sum_{u} f_{\hat\beta_i}(u|{t}_0)=\frac{1}2, where the sum is over all possible values of Ti, and a Newton-Raphson-type algorithm is used to perform the search.

Likelihood ratio tests based on the conditional pdf are used to test the null H_0\colon\beta_i=0 against various alternatives. For testing against the alternative H_A\colon\beta_i \gt 0, the critical region for the UMP test consists of the upper tail of values for Ti in the permutation distribution. Thus, the one-sided significance level pG(ti;0) is the probability of a more extreme (greater) value:

p_{G}(t_i;0)=\sum_{u\ge t_i}f_{0}(u|{t}_0)
The one-sided significance level pL(ti;0) against H_A\colon\beta_i \lt 0 is
p_{L}(t_i;0)=\sum_{u\le t_i}f_{0}(u|{t}_0)
The minimum of these one-sided levels is reported when the ONESIDED option is specified. The two-sided significance level p(ti;0) against H_A\colon\beta_i \ne 0 is calculated as
p(ti;0) = 2min[pL(ti;0),pG(ti;0)]

An upper 100(1-2\epsilon)% confidence limit for \hat\beta_icorresponding to the observed ti is the solution \beta_U(t_i)of \epsilon=p_{L}(t_i,\beta_U(t_i)), while the lower confidence limit is the solution \beta_L(t_i) of \epsilon=p_{G}(t_i,\beta_L(t_i)). A Newton-Raphson procedure is used to search for the solutions.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.