The FREQ Procedure

Jonckheere-Terpstra Test

The JT option in the TABLES statement provides the Jonckheere-Terpstra test, which is a nonparametric test for ordered differences among classes. It tests the null hypothesis that the distribution of the response variable does not differ among classes. It is designed to detect alternatives of ordered class differences, which can be expressed as $\tau _{1} \leq \tau _{2} \leq \cdots \leq \tau _{R}$ (or $\tau _{1} \geq \tau _{2} \geq \cdots \geq \tau _{R}$ ), with at least one of the inequalities being strict, where $\tau _{i}$ denotes the effect of class i. For such ordered alternatives, the Jonckheere-Terpstra test can be preferable to tests of more general class difference alternatives, such as the Kruskal–Wallis test (produced by the WILCOXON option in the NPAR1WAY procedure). See Pirie (1983) and Hollander and Wolfe (1999) for more information about the Jonckheere-Terpstra test.

The Jonckheere-Terpstra test is appropriate for a two-way table in which an ordinal column variable represents the response. The row variable, which can be nominal or ordinal, represents the classification variable. The levels of the row variable should be ordered according to the ordering you want the test to detect. The order of variable levels is determined by the ORDER= option in the PROC FREQ statement. The default is ORDER=INTERNAL, which orders by unformatted values. If you specify ORDER=DATA, PROC FREQ orders values according to their order in the input data set. For more information about how to order variable levels, see the ORDER= option.

The Jonckheere-Terpstra test statistic is computed by first forming $R(R-1)/2$ Mann-Whitney counts $M_{i,i^\prime }$ , where $i < i^\prime$ , for pairs of rows in the contingency table,

$\begin{eqnarray*} M_{i,i^\prime } & = \hspace{.2in} \{ & \mbox{number of times } X_{i,j} < X_{i^\prime ,j^\prime }, \hspace{.1in} j=1,\ldots ,n_{i.}; \hspace{.08in} j^\prime =1,\ldots ,n_{i^\prime .} ~ \} \\ & + \hspace{0.05in} \frac{1}{2} \hspace{.1in} \{ & \mbox{number of times } X_{i,j} = X_{i^\prime ,j^\prime }, \hspace{.1in} j=1,\ldots ,n_{i.}; \hspace{.08in} j^\prime =1,\ldots ,n_{i^\prime .} ~ \} \end{eqnarray*}$

where $X_{i,j}$ is response j in row i. The Jonckheere-Terpstra test statistic is computed as

$J = \sum _{1 \leq i <} \sum _{i^\prime \leq R} M_{i,i^\prime }$

This test rejects the null hypothesis of no difference among classes for large values of J. Asymptotic p-values for the Jonckheere-Terpstra test are obtained by using the normal approximation for the distribution of the standardized test statistic. The standardized test statistic is computed as

$J^\ast = \left( J - \mr{E}_0(J) \right) ~ / ~ \sqrt {\mr{Var}_0(J)}$

where $\mr{E}_0(J)$ and $\mr{Var}_0(J)$ are the expected value and variance of the test statistic under the null hypothesis,

$\mr{E}_0(J) = \left( n^2 - \sum _{i}n_{i \cdot }^2 \right) / 4$

$\mr{Var}_0(J) = A / 72 + B / \left( 36n(n-1)(n-2) \right) + C / \left( 8n(n-1) \right)$

where

$A = n(n-1)(2n+5) - \sum _{i}n_{i \cdot }(n_{i \cdot }-1)(2n_{i \cdot }+5) - \sum _{j}n_{\cdot j}(n_{\cdot j}-1)(2n_{\cdot j}+5)$

$B = \left(\sum _{i}n_{i \cdot }(n_{i \cdot }-1)(n_{i \cdot }-2) \right) \left(\sum _{j}n_{\cdot j}(n_{\cdot j}-1)(n_{\cdot j}-2) \right)$

$C = \left(\sum _{i}n_{i \cdot }(n_{i \cdot }-1) \right) \left(\sum _{j}n_{\cdot j}(n_{\cdot j}-1) \right)$

PROC FREQ computes one-sided and two-sided p-values for the Jonckheere-Terpstra test. When the standardized test statistic is greater than its null hypothesis expected value of 0, PROC FREQ displays the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. A small right-sided p-value supports the alternative hypothesis of increasing order from row 1 to row R. When the standardized test statistic is less than or equal to 0, PROC FREQ displays the left-sided p-value. A small left-sided p-value supports the alternative of decreasing order from row 1 to row R.

The one-sided p-value for the Jonckheere-Terpstra test, $P_1$ , is computed as

$\begin{equation*} P_1 = \begin{cases} \mr{Prob} (Z > J^\ast ) \quad \mr{if} \hspace{.1in} J^\ast > 0 \\ \mr{Prob} (Z < J^\ast ) \quad \mr{if} \hspace{.1in} J^\ast \leq 0 \\ \end{cases}\end{equation*}$

where Z has a standard normal distribution. The two-sided p-value, $P_2$ , is computed as

$P_{2} = \mr{Prob} (|Z| > |J^\ast |)$

PROC FREQ also provides exact p-values for the Jonckheere-Terpstra test. You can request the exact test by specifying the JT option in the EXACT statement. See the section Exact Statistics for more information.