Hodges-Lehmann Estimation of Location Shift :: SAS/STAT(R) 13.1 User's Guide

Hodges-Lehmann Estimation of Location Shift

If you specify the HL option, PROC NPAR1WAY computes the Hodges-Lehmann estimate of location shift for two-sample data. This option also provides asymptotic confidence limits for the location shift (which are sometimes known as Moses confidence limits). You can specify the confidence level by using the ALPHA= option in the PROC NPAR1WAY statement. The default of ALPHA=0.05 produces 95% confidence limits. Additionally, you can request exact confidence limits for the location shift by specifying the HL option in the EXACT statement.

The Hodges-Lehmann estimator of location shift is associated with the Wilcoxon linear rank statistic. See Hollander and Wolfe (1999) and Hodges and Lehmann (1983) for details.

PROC NPAR1WAY computes the Hodges-Lehmann estimate $\hat{\Delta }$ as the median of all paired differences between observations in the two samples (classes), which can be written as

$\hat{\Delta } = \mr {median} \left( ~ ( Y_ j - X_ i ) \quad \mr {where} \hspace{.1in} j = 1,2,\ldots ,n_1; ~ i = 1,2,\ldots ,n_2 ~ \right)$

The $Y_ j$ are observations in class 1, the $X_ i$ are observations in class 2, and $n_1$ and $n_2$ denote the number of observations in class 1 and class 2, respectively.

By default, PROC NPAR1WAY uses the larger of the two classes as the reference class X (class 2). If both class have the same number of observations, PROC NPAR1WAY uses the class that appears second in the input data set as the reference class. You can specify the reference class by using the HL(REFCLASS=) option. REFCLASS=1 refers to the first class that is listed in the “Wilcoxon Scores” table, and REFCLASS=2 refers to the second class in the table. REFCLASS=class-value identifies the reference class by the formatted value of the CLASS variable.

Let m denote the total number of differences ( $n_1 \times n_2$ ), and let $U^{(k)}$ denote the kth value of $(Y_ j - X_ i)$ among the ordered differences. When m is an odd number, then the median difference is the value with rank $(m + 1) / 2$ ,

$\hat{\Delta } = U^{(k)} \quad \mr {where} \hspace{.1in} k = (m + 1) / 2$

When m is an even number, the median difference is the average of the values with ranks $(m / 2)$ and $((m / 2) + 1)$ ,

$\hat{\Delta } = \left( U^{(k)} + U^{(k+1)} \right) / 2 \quad \mr {where} \hspace{.1in} k = m / 2$

Following Hollander and Wolfe (1999), the asymptotic lower and upper confidence limits for the location shift are

$\left( ~ \Delta _{\mi {L}} = U^{(C_{\alpha })}, \hspace{.10in} \Delta _{\mi {U}} = U^{(m + 1 - C_{\alpha })} ~ \right)$

where $C_{\alpha }$ is the largest integer less than or equal to $C_{\alpha }^{~ *}$ , which is computed as

$C_{\alpha }^{~ *} = \mr {E_0}(S) - z_{\alpha /2} \sqrt { \mr {Var_0}(S) }$

where $\mr {E_0}(S)$ and $\mr {Var_0}(S)$ are the expected value and variance, respectively, of the Wilcoxon statistic S under the null hypothesis (as described in the section Simple Linear Rank Tests for Two-Sample Data), and $z_{\alpha /2}$ is the $100(1 - \alpha /2)$ percentile of the standard normal distribution. For Wilcoxon rank scores,

$\mr {E_0}(S) = n_1 n_2 / 2$

When there are no tied values, $\mr {Var_0}(S)$ for Wilcoxon scores equals

$\mr {Var_0}(S) = n_1 n_2 ( n_1 + n_2 + 1 ) / 12$

PROC NPAR1WAY displays the midpoint of the confidence interval $( \Delta _{\mi {L}}, \Delta _{\mi {U}} )$ , which can also be used as an estimate of location shift. See Lehmann (1963) for details. Additionally, PROC NPAR1WAY provides an estimate of the asymptotic standard error of $\hat{\Delta }$ based on the length of the confidence interval, which is computed as

$\mr {se}(\hat{\Delta }) = ( \Delta _{\mi {U}} - \Delta _{\mi {L}} ) ~ / ~ (2 ~ z_{\alpha /2})$

Exact Confidence Limits

If you specify the HL option in the EXACT statement, PROC NPAR1WAY computes exact confidence limits for the location shift between the two samples. You can specify the level of the confidence limits by using the ALPHA= option in the PROC NPAR1WAY statement. The default of ALPHA=0.05 produces 95% confidence limits.

PROC NPAR1WAY computes exact confidence limits for the location shift as described in Randles and Wolfe (1979, p. 180). PROC NPAR1WAY first generates the exact conditional distribution of the Mann-Whitney U statistic, which equals the number of pairwise differences $(Y_ j - X_ i)$ that are positive, plus half the number of pairwise differences that are zero. The Mann-Whitney statistic is defined as

$\mi {MW} = \sum _{j=1}^{n_1} \sum _{i=i}^{n_2} \phi \left( Y_ j, X_ i \right)$

where

$\phi (Y_ j, X_ i) = \left\{ \begin{array}{ll} 1 & \mr {if} \hspace{.1in} Y_ j > X_ i \\[0.10in] 1 / 2 & \mr {if} \hspace{.1in} Y_ j = X_ i \\[0.10in] 0 & \mr {otherwise} \\ \end{array} \right.$

From the exact conditional distribution of the Mann-Whitney statistic $\mi {MW}$ , PROC NPAR1WAY chooses $C_{\mi {L},\alpha }^{*}$ as the smallest value such that $\mr {Prob}(\mi {MW} \geq C_{\mi {L},\alpha }^{*}) \leq \alpha /2$ . Rounding $C_{\mi {L},\alpha }^{*}$ up to the nearest integer $C_{\mi {L},\alpha }$ , the lower confidence limit equals the difference $(Y_ i - X_ j)$ that has a rank of $(n_1 n_2 - C_{\mi {L},\alpha } + 1)$ .

To find the upper confidence limit, PROC NPAR1WAY chooses $C_{\mi {U},\alpha }^{*}$ as the largest Mann-Whitney value such that $\mr {Prob}(\mi {MW} \leq C_{\mi {U},\alpha }^{*}) \leq \alpha /2$ . Rounding $C_{\mi {U},\alpha }^{*}$ down to the nearest integer $C_{\mi {L},\alpha }$ , the upper confidence limit equals the difference $(Y_ i - X_ j)$ that has a rank of $(n_1 n_2 - C_{\mi {U},\alpha })$ .

Because this is a discrete problem, the confidence coefficient is not exactly (1 – $\alpha$ ) but is at least (1 – $\alpha$ ); thus, these confidence limits are conservative.

The NPAR1WAY Procedure

Hodges-Lehmann Estimation of Location Shift

Exact Confidence Limits