The PSMOOTH Procedure

Statistical Computations

Methods of Smoothing $p$-Values

PROC PSMOOTH offers three methods of combining $p$-values over specified sizes of sliding windows. For each value $w$ listed in the BANDWIDTH= option of the PROC PSMOOTH statement, a sliding window of size $2w+1$ is used; that is, the $p$-values for each set of $2w+1$ consecutive markers are considered in turn, for each value $w$. The approach described by Zaykin et al. (2002) is implemented, where the original $p$-value at the center of the sliding window is replaced by a function of the original $p$-value and the $p$-values from the $w$ nearest markers on each side to create a new sequence of $p$-values. Note that for markers less than $w$ from the beginning or end of the data set (or BY group if any variables are specified in the BY statement), the number of hypotheses tested, $L$, is adjusted accordingly. The three methods of combining $p$-values from multiple hypotheses are Simes’ method, Fisher’s method, and the TPM, described in the following three sections. Plotting the new $p$-values versus the original $p$-values reveals the smoothing effect this technique has.

Simes’ Method

Simes’ method of combining $p$-values (1986) is performed as follows when the SIMES option is specified in the PROC PSMOOTH statement: let $p_ j$ be the original $p$-value at the center of the current sliding window, which contains $p_{j-w},\ldots ,p_{j+w}$. From these $L=2w+1$ $p$-values, the ordered $p$-values, $p_{(1)},\ldots ,p_{(L)}$ are formed. Then the new value for $p_ j$ is $\min _{1\leq i \leq L}(Lp_{(i)}/i)$.

This method controls the Type I error rate even when hypotheses are positively correlated (Sarkar and Chang, 1997), which is expected for nearby markers. Thus if dependencies are suspected among tests that are performed, this method is recommended due to its conservativeness.

Fisher’s Method

When the FISHER option is issued in the PROC PSMOOTH statement, Fisher’s method of combining $p$-values (1932) is applied by replacing the $p$-value at the center of the current sliding window $p_ j$ with the $p$-value of the statistic $t$, where

\[  t = -2 \sum _{i=j-w}^{j+w} \log (p_ i)  \]

which has a $\chi ^2_{2L}$ distribution under the null hypothesis of all $L=2w+1$ hypotheses being true.

Caution: $t$ has a $\chi ^2$ distribution only under the assumption that the tests performed are mutually independent. When this assumption is violated, the probability of Type I error can exceed the significance level $\alpha $.

TPM

The TPM is a variation of Fisher’s method that leads to a different alternative hypothesis when $\tau $, the value specified in the TAU= option, is less than 1 (Zaykin et al., 2002). With the TPM, rejection of the null hypothesis implies that there is at least one false null hypothesis among those with $p$-values $\leq \tau $. To calculate a combined $p$-value by using the TPM for the $p$-value at the center of the sliding window, $p_ j$, the quantity $u$ must first be calculated as

\[  u = \prod _{i=j-w}^{j+w} p_ i^{I(p_ i\leq \tau )}  \]

Then the formula for the new value for the $p$-value at the center of the sliding window of $L$ markers is

\[  \sum _{k=1}^ L {L \choose k}(1-\tau )^{L-k}\bigg( u \sum _{s=0}^{k-1} \frac{(k \log \tau - \log u)^ s}{s!} I(u\leq \tau ^ k)+\tau ^ k I(u>\tau ^ k)\bigg)  \]

When TAU=1 is specified, the TPM and Fisher’s method are equivalent and the previous formula simplifies to

\[  u \sum _{s=0}^{L-1} \frac{(-\log u)^ s}{s!}  \]

Multiple Testing Adjustments for $p$-Values

While the smoothing methods take into account the $p$-values from neighboring markers, the number of hypothesis tests performed does not change. Therefore, the Bonferroni, false discovery rate (FDR), and Šidák methods are offered by PROC PSMOOTH to adjust the smoothed $p$-values for multiple testing. The number of tests performed, $R$, is the number of valid observations in the current BY group if any variables are specified in the BY statement, or the number of valid observations in the entire data set if there are no variables specified in the BY statement. Note that these adjustments are not applied to the original column(s) of $p$-values; if you would like to adjust the original $p$-values for multiple testing, you must include a bandwidth of 0 in the BANDWIDTH= option of the PROC PSMOOTH statement along with one of the smoothing methods (SIMES, FISHER, or TPM).

For $R$ tests, the $p$-value $p_ i$ results in an adjusted $p$-value of $s_ i$ according to these methods:

Bonferroni adjustment:

$s_ i=\min (Rp_ i, 1.0), i=1,\ldots ,R$

Šidák adjustment (Šidák, 1967):

$s_ i=1-(1-p_ i)^ R, i=1,\ldots ,R$

FDR adjustment (Benjamini and Hochberg, 1995):
\begin{eqnarray*}  s_{(R)} &  = &  p_{(R)} \\ s_{(R-1)} &  = &  \min \left( s_{(R)} , [R/(R-1)] p_{(R-1)} \right) \\ s_{(R-2)} &  = &  \min \left( s_{(R-1)} , [R/(R-2)] p_{(R-2)} \right) \\ \vdots & & \end{eqnarray*}

where the $R$ $p$-values have been ordered as $p_{(1)} \le p_{(2)} \le \cdots \le p_{(R)}$. The Bonferroni and Šidák methods are conservative for controlling the family-wise error rate; however, often in the association mapping of a complex trait, it is desirable to control the FDR instead (Sabatti, Service, and Freimer, 2003).