The GLMPOWER Procedure

Contrasts in Fixed-Effect Univariate Models

The univariate linear model has the form

\[  \mb {y} = \mb {X} \bbeta + \bepsilon  \]

where $\mb {y}$ is the $N \times 1$ vector of responses, $\mb {X}$ is the $N \times k$ design matrix, $\bbeta $ is the $k \times 1$ vector of model parameters corresponding to the columns of $\mb {X}$, and $\bepsilon $ is an $N \times 1$ vector of errors with

\[  \epsilon _1, \ldots , \epsilon _ N \sim \mr {N}(0,\sigma ^2) \quad \mr {(iid)}  \]

In PROC GLMPOWER, the model parameters $\bbeta $ are not specified directly, but rather indirectly as $\mb {y^\star }$, which represents either conjectured response means or typical response values for each design profile. The $\mb {y^\star }$ values are manifested as the dependent variable in the MODEL statement. The vector $\bbeta $ is obtained from $\mb {y^\star }$ according to the least squares equation,

\[  \bbeta = (\mb {X}’\mb {X})^{-}\mb {X}’ \mb {y^\star }  \]

Note that, in general, there is not a one-to-one mapping between $\mb {y^\star }$ and $\bbeta $. Many different scenarios for $\mb {y^\star }$ might lead to the same $\bbeta $. If you specify $\mb {y^\star }$ with the intention of representing cell means, keep in mind that PROC GLMPOWER allows scenarios that are not valid cell means according to the model that is specified in the MODEL statement. For example, if $\mb {y^\star }$ exhibits an interaction effect but the corresponding interaction term is left out of the model, then the cell means ($\mb {X} \bbeta $) that are derived from $\bbeta $ differ from $\mb {y^\star }$. In particular, the cell means that are derived in this way are the projection of $\mb {y^\star }$ onto the model space.

It is convenient in power analysis to parameterize the design matrix $\mb {X}$ in three parts, $\{ \ddot{\mb {X}}, \mb {w}, N\} $, defined as follows:

  1. The $q \times k$ essence design matrix $\ddot{\mb {X}}$ is the collection of unique rows of $\mb {X}$. Its rows are sometimes referred to as design profiles. Here, $q \le N$ is defined simply as the number of unique rows of $\mb {X}$.

  2. The $q \times 1$ weight vector $\mb {w}$ reveals the relative proportions of design profiles, and $\mb {W} = \mr {diag}(\mb {w})$. Row i of $\ddot{\mb {X}}$ is to be included in the design $w_ i$ times for every $w_ j$ times that row j is included. The weights are assumed to be standardized (that is, they sum up to 1).

  3. The total sample size is N. This is the number of rows in $\mb {X}$. If you gather $N w_ i = n_ i$ copies of the $i$th row of $\ddot{\mb {X}}$, for $i = 1,\ldots ,q$, then you end up with $\mb {X}$.

The preceding quantities are derived from PROC GLMPOWER syntax as follows:

  • Values for $\ddot{\mb {X}}$, $\mb {y^\star }$, and $\mb {w}$ are specified in the exemplary data set (from using the DATA= option in the PROC GLMPOWER statement), and the corresponding variables are identified in the CLASS, MODEL, and WEIGHT statements.

  • $N$ is specified in the NTOTAL= option in the POWER statement.

It is useful to express the crossproduct matrix $\mb {X}’\mb {X}$ in terms of these three parts,

\[  \mb {X}’\mb {X} = N \ddot{\mb {X}}’ \mb {W} \ddot{\mb {X}}  \]

because this expression factors out the portion (N) that depends on sample size and the portion ($\ddot{\mb {X}}’ \mb {W} \ddot{\mb {X}}$) that depends only on the design structure.

A general linear hypothesis for the univariate model has the form

\begin{align*}  H_0\colon & \mb {L} \bbeta = \btheta _0 \\ H_ A\colon & \mb {L} \bbeta \ne \btheta _0 \end{align*}

where $\mb {L}$ is an $l \times k$ contrast matrix with rank $r_ L$ and $\btheta _0$ is the null value (usually just a vector of zeros).

Note that model effect tests are just contrasts that use special forms of $\mb {L}$. Thus, this scheme covers both effect tests (which are specified in the MODEL statement and the EFFECTS= option in the POWER statement) and custom contrasts (which are specified in the CONTRAST statement).

The model degrees of freedom $\mr {DF_ M}$ are equal to the rank of $\mb {X}$, denoted $r_ X$. The error degrees of freedom $\mr {DF_ E}$ are equal to $N - r_ X$. The sample size $N$ must be at least $\mr {DF_ M}$ plus the number of covariates.

The test statistic is

\[  F = \frac{\left(\frac{\mr {SS_ H}}{r_ L}\right)}{\hat{\sigma }^2}  \]

where

\begin{align*}  \mr {SS_ H} & = \frac{1}{N} \left(\mb {L} \hat{\bbeta } - \btheta _0 \right)’\left(\mb {L} \left(\mb {X}’\mb {X}\right)^{-} \mb {L}^\prime \right)^{-1} \left(\mb {L} \hat{\bbeta } - \btheta _0 \right) \\ \hat{\bbeta } & = (\mb {X}’\mb {X})^{-}\mb {X}’ \mb {y} \\ \hat{\sigma }^2 & = \frac{1}{\mr {DF_ E}} \left( \mb {y} - \mb {X} \hat{\bbeta } \right)’ \left( \mb {y} - \mb {X} \hat{\bbeta } \right) \end{align*}

Under $H_0$, $F \sim F(r_ L, \mr {DF_ E})$. Under $H_ A$, F is distributed as $F(r_ L, \mr {DF_ E}, \lambda )$ with noncentrality

\[  \lambda = N \left(\mb {L} \bbeta - \btheta _0 \right)’\left(\mb {L} \left(\ddot{\mb {X}}’ \mb {W} \ddot{\mb {X}} \right)^{-1} \mb {L}^\prime \right)^{-1} \left(\mb {L} \bbeta - \btheta _0 \right) \sigma ^{-2}  \]

The value of $\sigma $ is specified in the STDDEV= option in the POWER statement.

Muller and Peterson (1984) give the exact power of the test as

\[  \mr {power} = P\left(F(r_ L, \mr {DF_ E}, \lambda ) \ge F_{1-\alpha }(r_ L, \mr {DF_ E})\right)  \]

The value of $\alpha $ is specified in the ALPHA= option in the POWER statement.

Sample size is computed by inverting the power equation.

See Muller and Benignus (1992) and O’Brien and Shieh (1992) for additional discussion.