The CORR Procedure

Kendall’s Tau-b Correlation Coefficient

Subsections:

Kendall’s tau-b is a nonparametric measure of association based on the number of concordances and discordances in paired observations. Concordance occurs when paired observations vary together, and discordance occurs when paired observations vary differently. The formula for Kendall’s tau-b is

\[ \tau = \frac{\sum _{i<j} \, (\mr{sgn}(x_ i-x_ j) \mr{sgn}(y_ i-y_ j))}{\sqrt {(T_0-T_1)(T_0-T_2)}} \]

where $T_0= n(n-1)/2$, $T_1= \sum _ k \,  t_ k(t_ k-1)/2$, and $T_2= \sum _ l \,  u_ l(u_ l-1)/2$. The $t_ k$ is the number of tied x values in the kth group of tied x values, $u_ l$ is the number of tied y values in the lth group of tied y values, n is the number of observations, and $\mr{sgn}(z)$ is defined as

\[ \mr{sgn}(z) = \left\{ \begin{array}{ll} 1 & \mr{if} \, \, z > 0 \\ 0 & \mr{if} \, \, z = 0 \\ -1 & \mr{if} \, \, z < 0 \end{array} \right. \]

PROC CORR computes Kendall’s tau-b by ranking the data and using a method similar to Knight (1966). The data are double sorted by ranking observations according to values of the first variable and reranking the observations according to values of the second variable. PROC CORR computes Kendall’s tau-b from the number of interchanges of the first variable and corrects for tied pairs (pairs of observations with equal values of X or equal values of Y).

Probability Values

Probability values for Kendall’s tau-b are computed by treating

\[ \frac{s}{\sqrt {V(s)}} \]

as coming from a standard normal distribution where

\[ s=\sum _{i<j} \, (\mr{sgn} (x_ i-x_ j) \mr{sgn} (y_ i-y_ j)) \]

and $V(s)$, the variance of s, is computed as

\[ V(s)=\frac{v_0-v_ t-v_ u}{18}+\frac{v_1}{2n(n-1)}+\frac{v_2}{9n(n-1)(n-2)} \]

where

$v_0=n(n-1)(2n+5)$

$v_ t=\sum _ k \,  t_ k (t_ k-1)(2t_ k+5)$

$v_ u=\sum _ l \,  u_ l (u_ l-1)(2u_ l+5)$

$v_1=(\sum _ k \,  t_ k(t_ k-1)) \,  (\sum u_ i(u_ l-1))$

$v_2=(\sum _ l \,  t_ i(t_ k-1)(t_ k-2)) \,  (\sum u_ l(u_ l-1)(u_ l-2))$

The sums are over tied groups of values where $t_ i$ is the number of tied x values and $u_ i$ is the number of tied y values (Noether 1967). The sampling distribution of Kendall’s partial tau-b is unknown; therefore, the probability values are not available.