The UNIVARIATE Procedure

Tests for Location

PROC UNIVARIATE provides three tests for location: Student’s $t$ test, the sign test, and the Wilcoxon signed rank test. All three tests produce a test statistic for the null hypothesis that the mean or median is equal to a given value $\mu _0$ against the two-sided alternative that the mean or median is not equal to $\mu _0$. By default, PROC UNIVARIATE sets the value of $\mu _0$ to zero. You can use the MU0= option in the PROC UNIVARIATE statement to specify the value of $\mu _0$. Student’s $t$ test is appropriate when the data are from an approximately normal population; otherwise, use nonparametric tests such as the sign test or the signed rank test. For large sample situations, the $t$ test is asymptotically equivalent to a $z$ test. If you use the WEIGHT statement, PROC UNIVARIATE computes only one weighted test for location, the $t$ test. You must use the default value for the VARDEF= option in the PROC statement (VARDEF=DF). See Example 4.12.

You can also use these tests to compare means or medians of paired data. Data are said to be paired when subjects or units are matched in pairs according to one or more variables, such as pairs of subjects with the same age and gender. Paired data also occur when each subject or unit is measured at two times or under two conditions. To compare the means or medians of the two times, create an analysis variable that is the difference between the two measures. The test that the mean or the median difference of the variables equals zero is equivalent to the test that the means or medians of the two original variables are equal. Note that you can also carry out these tests by using the PAIRED statement in the TTEST procedure; see Chapter 99: The TTEST Procedure in SAS/STAT 12.3 User's Guide,. Also see Example 4.13.

Student’s t Test

PROC UNIVARIATE calculates the $t$ statistic as

\[  t=\frac{\bar{x}-\mu _0}{s/\sqrt {n}}  \]

where $\bar{x}$ is the sample mean, $n$ is the number of nonmissing values for a variable, and $s$ is the sample standard deviation. The null hypothesis is that the population mean equals $\mu _0$. When the data values are approximately normally distributed, the probability under the null hypothesis of a $t$ statistic that is as extreme, or more extreme, than the observed value (the $p$-value) is obtained from the $t$ distribution with $n-1$ degrees of freedom. For large $n$, the $t$ statistic is asymptotically equivalent to a $z$ test. When you use the WEIGHT statement and the default value of VARDEF=, which is DF, the $t$ statistic is calculated as

\[  t_ w =\frac{\bar{x}_ w -\mu _0 }{s_ w / \sqrt {\sum _{i=1}^{n}w_ i} }  \]

where $\bar{x}_ w$ is the weighted mean, $s_ w$ is the weighted standard deviation, and $w_ i$ is the weight for $i$th observation. The $t_ w$ statistic is treated as having a Student’s $t$ distribution with $n-1$ degrees of freedom. If you specify the EXCLNPWGT option in the PROC statement, $n$ is the number of nonmissing observations when the value of the WEIGHT variable is positive. By default, $n$ is the number of nonmissing observations for the WEIGHT variable.

Sign Test

PROC UNIVARIATE calculates the sign test statistic as

\[  M=(n^+ -n^- )/2  \]

where $n^+$ is the number of values that are greater than $\mu _0$, and $n^-$ is the number of values that are less than $\mu _0$. Values equal to $\mu _0$ are discarded. Under the null hypothesis that the population median is equal to $\mu _0$, the $p$-value for the observed statistic $M_{obs}$ is

\[  \mr {Pr}(|M_{obs}| \geq |M|)=0.5^{(n_ t -1)} \sum _{j=0}^{min(n^+ ,n^-)} \left(\begin{array}{c} n_ t \cr i \end{array}\right)  \]

where $n_ t=n^+ +n^-$ is the number of $x_ i$ values not equal to $\mu _0$.

Note: If $n^+$ and $n^-$ are equal, the $p$-value is equal to one.

Wilcoxon Signed Rank Test

The signed rank statistic $S$ is computed as

\[  S =\sum _{ i:|x_ i - \mu _0| > 0} r_ i^+ - \frac{n_ t (n_ t+1)}{4}  \]

where $r_ i^+$ is the rank of $|x_ i-\mu _0|$ after discarding values of $x_ i = \mu _0$, and $n_ t$ is the number of $x_ i$ values not equal to $\mu _0$. Average ranks are used for tied values.

If $n_ t \leq 20$, the significance of $S$ is computed from the exact distribution of $S$, where the distribution is a convolution of scaled binomial distributions. When $n_ t > 20$, the significance of $S$ is computed by treating

\[  S \sqrt { \frac{n_ t - 1}{n_ tV -S^2} }  \]

as a Student’s $t$ variate with $n_ t - 1$ degrees of freedom. $V$ is computed as

\[  V = \frac{1}{24} n_ t(n_ t+1)(2n_ t+1) - \frac{1}{48} \sum t_ i(t_ i+1)(t_ i-1)  \]

where the sum is over groups tied in absolute value and where $t_ i$ is the number of values in the $i$th group (Iman, 1974; Conover, 1980). The null hypothesis tested is that the mean (or median) is $\mu _0$, assuming that the distribution is symmetric. Refer to Lehmann and D’Abrera (1975).