The SEQDESIGN Procedure

Statistical Assumptions for Group Sequential Designs

The SEQDESIGN procedure assumes that with a total number of stages K, the sequence of the standardized test statistics $\{  Z_{1}, Z_{2}, \ldots , Z_{K} \} $ has the canonical joint distribution with information levels $\{  I_{1}, I_{2}, \ldots , I_{K} \} $ for the parameter $\theta $ (Jennison and Turnbull, 2000, p. 49):

  • $(Z_{1}, Z_{2}, \ldots , Z_{K})$ is multivariate normal

  • $Z_{k} \sim N \left( \,  \theta \sqrt {I_{k}}, \,  1 \right), \, \,  k= 1, 2, \ldots , K$

  • $\mr {Cov}( Z_{k_1}, Z_{k_2})= \sqrt {(I_{k_1} / I_{k_2})}$,    $1 \leq k_1 \leq k_2 \leq K$

In terms of the maximum likelihood estimator, $\hat{\theta }_{k}= Z_{k} / \sqrt {I_{k}}$, $k= 1, 2, \ldots , K$, the canonical joint distribution can be expressed as follows:

  • $(\hat{\theta }_{1}, \hat{\theta }_{2}, \ldots , \hat{\theta }_{K})$ is multivariate normal

  • $\hat{\theta }_{k} \sim N \left( \,  \theta , \,  1 / I_{k} \right), \, \,  k= 1, 2, \ldots , K$

  • $\mr {Cov}( \hat{\theta }_{k_1}, \hat{\theta }_{k_2})= 1 / I_{k_2}$,    $1 \leq k_1 \leq k_2 \leq K$

Furthermore, in terms of the score statistics $S_{k}= Z_{k} \sqrt {I_{k}}$, $k= 1, 2, \ldots , K$, the canonical joint distribution can be expressed as follows:

  • $(S_{1}, S_{2}, \ldots , S_{K})$ is multivariate normal

  • $S_{k} \sim N \left( \,  \theta \,  I_{k}, \,  I_{k} \right), \, \,  k= 1, 2, \ldots , K$

  • $\mr {Cov}( S_{k_1}, S_{k_2})= \mr {Var}( S_{k_1}) = I_{k_1}$,    $1 \leq k_1 \leq k_2 \leq K$

That is, the increments $S_{1}$, $S_{2}-S_{1}$, …, and $S_{K}-S_{(K-1)}$ are independently distributed.

If the test statistic is computed from the data that are not from a normal distribution, such as a binomial distribution, then it is assumed that the test statistic is computed from a large sample such that the statistic has an approximately normal distribution.

If the increments $S_{1}$, $S_{2}-S_{1}$, …, and $S_{K}-S_{(K-1)}$ are not independently distributed, then it is inappropriate to use group sequential methods in the SEQDESIGN procedure. One such example is the Gehan statistic, which is a weighted log-rank statistic for censored data. See Jennison and Turnbull (2000, pp. 232–233, 276–277) and Proschan, Lan, and Wittes (2006, pp. 150–151) for a description of statistics with nonindependent increments.

If a trial stops at an early interim stage with only a small number of responses observed, it can lead to a distrust of the statistical findings, which rely on the assumption that the sample is large (Whitehead, 1997, p. 167). A group sequential design can be specified such that at the first interim analysis, there are a sufficient number of responses to ensure that the analysis to be conducted is both reliable and persuasive (Whitehead, 1997, p. 167).

Alternatively, a method such as the O’Brien-Fleming method can be used to derive conservative stopping boundary values at very early stages to make the early stop less likely. That is, the trial is stopped in early stages only with overwhelming evidence.

A simple example of the group sequential tests is the test for a normal mean, $\mu = \mu _{0}$. Suppose ${y}_{1}, {y}_{2}, \ldots , {y}_{n}$ are n observations of a response variable Y in a data set from a normal distribution with an unknown mean $\mu $ and a known variance ${\sigma }^{2}$. Then the maximum likelihood estimate of $\mu $ is the sample mean

\[  {\overline y} = \frac{1}{n} \sum _{j=1}^{n} y_{j}  \]

The sample mean has a normal distribution with mean $\mu $ and variance ${\sigma }^{2}/n$:

\[  {\overline y} \sim N \left( \,  \mu , \,  \frac{{\sigma }^{2}}{n} \right)  \]

An equivalent hypothesis for $\mu = \mu _{0}$ is $H_{0}: \theta = 0$, where $\theta = \mu - \mu _{0}$. The MLE statistic for $\theta $,

\[  \hat{\theta }= {\overline y} - \mu _{0} \sim N \left( \,  \theta , \,  {I_{0}}^{-1} \right)  \]

where the information $I_{0} = n/{\sigma }^{2}$.

For a group sequential test with K stages, there are $N_1, N_2, \ldots , N_ K$ observations available at these stages. At stage k, the sample mean is computed as

\[  {\overline{y}}_{k} = \frac{1}{N_{k}} \sum _{j=1}^{N_{k}} {{y}_{kj}}  \]

where ${y}_{kj}$ is the value of the jth observation available at the kth stage and $N_{k}$ is the cumulative sample size at stage k, which includes the $N_{k-1}$ observations collected at previous stages and the $N_{k}-N_{k-1}$ observations collected at the current stage.

The maximum likelihood estimate

\[  \hat{\theta }_ k = {\overline{y}}_{k} - \mu _{0} \sim N \left( \,  \theta , \,  {I_{k}}^{-1} \right)  \]

where the information

\[  I_{k} = \frac{1}{\mr {Var}({\overline{y}}_{k})} = \frac{N_{k}}{{\sigma }^{2}}  \]

is the inverse of the variance.

Thus, the standardized statistic

\[  Z_{k} = \hat{\theta }_ k \sqrt {I_{k}} = ({\overline{y}}_{k} - \mu _{0}) \sqrt {I_{k}} \sim N \left( \,  \theta \sqrt {I_{k}}, \,  1 \right)  \]

The covariance of $Z_{k_1}$ and $Z_{k_2}$, $1 \leq k_1 \leq k_2 \leq K$ can be expressed as

\[  \mr {Cov}( Z_{k_1}, Z_{k_2}) = \frac{1}{\sqrt {(I_{k_1} I_{k_2})}} \;  \mr {Cov}( S_{k_1}, S_{k_2} )  \]

where $S_{k_1}= Z_{k_1} \sqrt {I_{k_1}}$ and $S_{k_2}= Z_{k_2} \sqrt {I_{k_2}}$.

Since $S_{k_2} - S_{k_1}$ is independent of $S_{k_1}$, $\mr {Cov}( S_{k_1}, S_{k_2} ) = \mr {Var}( S_{k_1} ) = I_{k_1}$ and

\[  \mr {Cov}( Z_{k_1}, Z_{k_2}) = \frac{1}{\sqrt {( I_{k_1} I_{k_2} )}} I_{k_1} = \sqrt { I_{k_1} / I_{k_2} }  \]

Thus the statistics $\{  Z_{1}, Z_{2}, \ldots , Z_{K} \} $ has the canonical joint distribution with information levels $\{  I_{1}, I_{2}, \ldots , I_{K} \} $ for the parameter $\mu $. See the section Applicable One-Sample Tests and Sample Size Computation, the section Applicable Two-Sample Tests and Sample Size Computation, and the section Applicable Regression Parameter Tests and Sample Size Computation for more examples of applicable tests in group sequential trials.