The CORR Procedure

Hoeffding Dependence Coefficient

Probability Values

Hoeffding’s measure of dependence, , is a nonparametric measure of association that detects more general departures from independence. The statistic approximates a weighted sum over observations of chi-square statistics for two-by-two classification tables (Hoeffding, 1948). Each set of values are cut points for the classification. The formula for Hoeffding’s is

$D = 30 \, \frac{(n-2)(n-3)D_1+D_2-2(n-2)D_3}{n(n-1)(n-2)(n-3)(n-4)}$

where $D_1 =\sum _ i (Q_ i-1)(Q_ i-2)$ , $D_2 =\sum _ i (R_ i-1)(R_ i-2)(S_ i-1)(S_ i-2)$ , and $D_3 =\sum _ i (R_ i-2)(S_ i-2)(Q_ i-1)$ . is the rank of , is the rank of , and (also called the bivariate rank) is 1 plus the number of points with both and values less than the th point.

A point that is tied on only the value or value contributes 1/2 to if the other value is less than the corresponding value for the th point.

A point that is tied on both and contributes 1/4 to . PROC CORR obtains the values by first ranking the data. The data are then double sorted by ranking observations according to values of the first variable and reranking the observations according to values of the second variable. Hoeffding’s statistic is computed using the number of interchanges of the first variable. When no ties occur among data set observations, the statistic values are between 0.5 and 1, with 1 indicating complete dependence. However, when ties occur, the statistic might result in a smaller value. That is, for a pair of variables with identical values, the Hoeffding’s statistic might be less than 1. With a large number of ties in a small data set, the statistic might be less than 0.5. For more information about Hoeffding’s , see Hollander and Wolfe (1999).

Probability Values

The probability values for Hoeffding’s statistic are computed using the asymptotic distribution computed by Blum, Kiefer, and Rosenblatt (1961). The formula is

$\frac{(n-1)\pi ^{4}}{60}D + \frac{\pi ^4}{72}$

which comes from the asymptotic distribution. If the sample size is less than 10, refer to the tables for the distribution of in Hollander and Wolfe (1999).