The SEQDESIGN Procedure

Fixed-Sample Clinical Trials

A clinical trial is a research study in consenting human beings to answer specific health questions. One type of trial is a treatment trial, which tests the effectiveness of an experimental treatment. An example is a planned experiment designed to assess the efficacy of a treatment in humans by comparing the outcomes in a group of patients who receive the test treatment with the outcomes in a comparable group of patients who receive a placebo control treatment, where patients in both groups are enrolled, treated, and followed over the same time period.

A clinical trial is conducted according to a plan called a protocol. The protocol provides detailed description of the study. For a fixed-sample trial, the study protocol contains detailed information such as the null hypothesis, the one-sided or two-sided test, and the Type I and II error probability levels. It also includes the test statistic and its associated critical values in the hypothesis testing.

Generally, the efficacy of a new treatment is demonstrated by testing a hypothesis $H_{0}: \theta = 0$ in a clinical trial, where $\theta $ is the parameter of interest. For example, to test whether a population mean $\mu $ is greater than a specified value $\mu _{0}$, $\theta = \mu - \mu _{0}$ can be used with an alternative $\theta > 0$.

A one-sided test is a test of the hypothesis with either an upper (greater) or a lower (lesser) alternative, and a two-sided test is a test of the hypothesis with a two-sided alternative. The drug industry often prefers to use a one-sided test to demonstrate clinical superiority based on the argument that a study should not be run if the test drug would be worse (Chow, Shao, and Wang, 2003, p. 28). But in practice, two-sided tests are commonly performed in drug development (Senn, 1997, p. 161). For a fixed Type I error probability $\alpha $, the sample sizes required by one-sided and two-sided tests are different. See Senn (1997, pp. 161–167) for a detailed description of issues involving one-sided and two-sided tests.

For independent and identically distributed observations $y_{1}, y_{2}, \ldots , y_{n}$ of a random variable, the likelihood function for $\theta $ is

\[  L(\theta ) = \prod _{j=1}^{n} L_{i} (\theta )  \]

where $\theta $ is the population parameter and $L_{i}(\theta )$ is the probability or probability density of $y_{i}$. Using the likelihood function, two statistics can be derived that are useful for inference: the maximum likelihood estimator and the score statistic.

Maximum Likelihood Estimator

The maximum likelihood estimate (MLE) of $\theta $ is the value $\hat{\theta }$ that maximizes the likelihood function for $\theta $. Under mild regularity conditions, $\hat{\theta }$ is an asymptotically unbiased estimate of $\theta $ with variance $1/E_{\theta }( I(\theta ) )$, where $I(\theta )$ is the Fisher information

\[  I(\theta ) = - \frac{ {\partial }^{2} \mr {log}( L(\theta ))}{\partial {\theta }^{2}}  \]

and $E_{\theta }(I(\theta ))$ is the expected Fisher information (Diggle et al., 2002, p. 340)

\[  E_{\theta }( I(\theta ) ) = - E_{\theta } \left( \frac{ {\partial }^{2} \mr {log}( L(\theta ))}{\partial {\theta }^{2}} \right)  \]

The score function for $\theta $ is defined as

\[  S( \theta ) = \frac{ \partial \,  \mr {log}( L(\theta )) }{\partial \theta }  \]

and usually, the MLE can be derived by solving the likelihood equation $S( \theta ) = 0$. Asymptotically, the MLE is normally distributed (Lindgren, 1976, p. 272):

\[  \hat{\theta } \sim N \left( \,  \theta , \,  \frac{1}{E_{\theta }( I(\theta ) )} \right)  \]

If the Fisher information $I(\theta )$ does not depend on $\theta $, then $I(\theta )$ is known. Otherwise, either the expected information evaluated at the MLE ${\hat\theta }$ ($E_{\theta ={\hat{\theta }}} (I(\theta ))$) or the observed information $I({\hat\theta })$ can be used for the Fisher information (Cox and Hinkley 1974, p. 302; Efron and Hinkley 1978, p. 458), where the observed Fisher information

\[  I({\hat{\theta }}) = - \left( \frac{ {\partial }^{2} \mr {log}( L(\theta )) }{ \partial {\theta }^{2} } \,  | \,  \theta ={\hat\theta } \right)  \]

If the Fisher information $I(\theta )$ does depend on $\theta $, the observed Fisher information is recommended for the variance of the maximum likelihood estimator (Efron and Hinkley, 1978, p. 457).

Thus, asymptotically, for large n,

\[  \hat{\theta } \sim N \left( \,  \theta , \,  \frac{1}{I} \right)  \]

where I is the information, either the expected Fisher information $E_{\theta =0} (I(\theta ))$ or the observed Fisher information $I{\hat{\theta }})$.

So to test $H_{0}: \theta = 0$ versus $H_{1}: \theta \neq 0$, you can use the standardized Z test statistic

\[  Z = \frac{\hat\theta }{\sqrt {\mr {Var}({\hat\theta })}} = {\hat\theta } \,  \sqrt {I} \sim N \left( \,  0, \,  1 \right)  \]

and the two-sided p-value is given by

\[  \mr {Prob} (|Z| > |z_0|) = 1 - 2 \Phi (|z_0|)  \]

where $\Phi $ is the cumulative standard normal distribution function and $z_0$ is the observed Z statistic.

If the BOUNDARYSCALE=SCORE is specified in the SEQDESIGN procedure, the boundary values for the test statistic are displayed in the score statistic scale. With the standardized Z statistic, the score statistic $S= Z \sqrt {I}= \hat{\theta } I$ and

\[  S \sim N \left( \,  0, \,  I \right)  \]

Score Statistic

The score statistic is based on the score function for $\theta $,

\[  S( \theta ) = \frac{ \partial \,  \mr {log}( L(\theta )) }{\partial \theta }  \]

Under the null hypothesis $H_{0}: \theta = 0$, the score statistic $S(0)$ is the first derivative of the log likelihood evaluated at the null reference 0:

\[  S(0) = \frac{ \partial \,  \mr {log}( L(\theta )) }{\partial \theta } \,  | \,  \theta =0  \]

Under regularity conditions, $S(0)$ is asymptotically normally distributed with mean zero and variance $E_{\theta =0} (I(\theta ))$, the expected Fisher information evaluated at the null hypothesis $\theta =0$ (Kalbfleisch and Prentice, 1980, p. 45), where $I(\theta )$ is the Fisher information

\[  I(\theta ) = - E \left( \frac{ {\partial }^{2} \,  \mr {log}( L(\theta )) }{ \partial {\theta }^{2} } \right)  \]

That is, for large n,

\[  S(0) \sim N \left( \,  0, \,  E_{\theta =0} (I(\theta )) \right)  \]

Asymptotically, the variance of the score statistic $S(0)$, $E_{\theta =0} (I(\theta ))$, can also be replaced by the expected Fisher information evaluated at the MLE $\theta ={\hat{\theta }}$ ($E_{\theta ={\hat{\theta }}} (I(\theta ))$), the observed Fisher information evaluated at the null hypothesis $\theta =0$ ($I(0))$, or the observed Fisher information evaluated at the MLE $\theta ={\hat{\theta }}$ ($I({\hat{\theta }})$) (Kalbfleisch and Prentice, 1980, p. 46), where

\[  I(0) = - \left( \frac{ {\partial }^{2} \mr {log}( L(\theta )) }{ \partial {\theta }^{2} } \,  | \,  \theta =0 \right)  \]
\[  I({\hat{\theta }}) = - \left( \frac{ {\partial }^{2} \mr {log}( L(\theta )) }{ \partial {\theta }^{2} } \,  | \,  \theta ={\hat\theta } \right)  \]

Thus, asymptotically, for large n,

\[  S(0) \sim N \left( \,  0, \,  I \right)  \]

where I is the information, either an expected Fisher information ($E_{\theta =0} (I(\theta ))$ or $E_{\theta ={\hat{\theta }}} (I(\theta ))$) or a observed Fisher information ($I(0)$ or $I({\hat{\theta }})$).

So to test $H_{0}: \theta = 0$ versus $H_{1}: \theta \neq 0$, you can use the standardized Z test statistic

\[  Z = \frac{S(0)}{\sqrt {I}}  \]

If the BOUNDARYSCALE=MLE is specified in the SEQDESIGN procedure, the boundary values for the test statistic are displayed in the MLE scale. With the standardized Z statistic, the MLE statistic ${\hat\theta }= Z / \sqrt {I}= U(0) / I$ and

\[  {\hat\theta } \sim N \left( \,  0, \,  \frac{1}{I} \right)  \]

One-Sample Test for Mean

The following one-sample test for mean is used to demonstrate fixed-sample clinical trials in the section One-Sided Fixed-Sample Tests in Clinical Trials and the section Two-Sided Fixed-Sample Tests in Clinical Trials.

Suppose ${y}_{1}, {y}_{2}, \ldots , {y}_{n}$ are n observations of a response variable Y from a normal distribution

\[  y_{i} \sim N \left( \,  \theta , \,  {\sigma }^{2} \right)  \]

where $\theta $ is the unknown mean and ${\sigma }^{2}$ is the known variance.

Then the log likelihood function for $\theta $ is

\[  \mr {log} (L(\theta )) = \sum _{j=1}^{n} -\frac{1}{2} \frac{{(y_ j-\theta )}^{2}}{\sigma ^{2}} + c  \]

where c is a constant. The first derivative is

\[  \frac{ {\partial } \mr {log}( L(\theta )) }{ \partial {\theta } } = \frac{1}{\sigma ^{2}} \sum _{j=1}^{n} (y_ j - \theta ) = \frac{n}{\sigma ^{2}} ( {\overline{y}} - \theta )  \]

where ${\overline{y}}$ is the sample mean.

Setting the first derivative to zero, the MLE of $\theta $ is $\hat{\theta } = {\overline{y}}$, the sample mean. The variance for $\hat{\theta }$ can be derived from the Fisher information

\[  I(\theta ) = - \frac{ {\partial }^{2} \mr {log}( L(\theta )) }{ \partial {\theta }^{2} } = \frac{n}{\sigma ^{2}}  \]

Since the Fisher information $I_0= I(\theta )$ does not depend on $\theta $ in this case, $1/I_0$ is used as the variance for $\hat{\theta }$. Thus the sample mean ${\overline{y}}$ has a normal distribution with mean $\theta $ and variance ${\sigma }^{2}/n$:

\[  \hat{\theta } = {\overline{y}} \sim N \left( \,  \theta , \,  \frac{1}{I_{0}} \right) = N \left( \,  \theta , \,  \frac{{\sigma }^{2}}{n} \right)  \]

Under the null hypothesis $H_{0}: \theta = 0$, the score statistic

\[  S(0) = \frac{ \partial \,  \mr {log}( L(\theta )) }{\partial \theta } | \theta =0 = \frac{n}{\sigma ^{2}} {\overline{y}}  \]

has a mean zero and variance

\[  I(\theta ) = - \frac{ {\partial }^{2} \mr {log}( L(\theta )) }{ \partial {\theta }^{2} } = \frac{n}{\sigma ^{2}}  \]

With the MLE $\hat{\theta }$, the corresponding standardized statistic is computed as $Z= \hat{\theta } \sqrt {I_{0}} = {\overline{y}} / (\sigma / \sqrt {n})$, which has a normal distribution with variance 1:

\[  Z \sim N \left( \,  {\theta } \sqrt {I_{0}}, \,  1 \right) = N \left( \,  \frac{\theta }{\sigma / \sqrt {n}}, \,  1 \right)  \]

Also, the corresponding score statistic is computed as $S= \hat{\theta } I_{0} = n {\overline{y}} / {\sigma }^{2}$ and

\[  S \sim N \left( \,  {\theta } I_{0}, \,  I_{0} \right) = N \left( \,  \frac{n \theta }{{\sigma }^{2}}, \,  \frac{n}{{\sigma }^{2}} \right)  \]

which is identical to $S(0)$ computed under the null hypothesis $H_{0}: \theta = 0$.

Note that if the variable Y does not have a normal distribution, then it is assumed that the sample size n is large such that the sample mean has an approximately normal distribution.