The SURVEYMEANS Procedure

Quantiles

Let Y be the variable of interest in a complex survey. Denote $F(t)=\Pr (Y\le t)$ as the cumulative distribution for Y. For $0<p<1$, the pth quantile of the population cumulative distribution function is

\[  Y_ p=\inf \{ y: F(y)\ge p\}   \]
Estimate of Quantile

Let $\{ y_{hij}, w_{hij}\} $ be the observed values for variable Y associated with sampling weights, where $(h,i,j)$ are the stratum index, cluster index, and member index, respectively, as shown in the section Definitions and Notation. Let $ y_{(1)} < y_{(2)} < ... < y_{(d)}$ denote the sample order statistics for variable Y.

An estimate of quantile $Y_ p$ is

\[  \hat Y_ p= \left\{  \begin{array}{ll} y_{(1)} &  \mbox{ if } p<\hat F(y_{(1)}) \\ y_{(k)}+\displaystyle {\frac{p-\hat F(y_{(k)})}{\hat F(y_{(k+1)})-\hat F(y_{(k)})}} (y_{(k+1)}-y_{(k)}) &  \mbox{ if } \hat F(y_{(k)}) \le p < \hat F(y_{(k+1)}) \\ y_{(d)} &  \mbox{ if } p=1 \end{array} \right.  \]

where $\hat F(t)$ is the estimated cumulative distribution for Y:

\[  \hat F(t)=\frac{\sum _{h=1}^ H\sum _{i=1}^{n_ h} \sum _{j=1}^{m_{hi}}w_{hij}I(y_{hij}\le t)}{\sum _{h=1}^ H\sum _{i=1}^{n_ h}\sum _{j=1}^{m_{hi}}w_{hij}}  \]

and $I(\cdot )$ is the indicator function.

Standard Error

When you use VARMETHOD=TAYLOR, or by default if you do not specify the VARMETHOD= option, PROC SURVEYMEANS uses Woodruff’s method (Dorfman and Valliant, 1993; Särndal, Swensson, and Wretman, 1992; Francisco and Fuller, 1991) to estimate the variances of quantiles. This method first constructs a confidence interval on a quantile. Then it uses the width of the confidence interval to estimate the standard error of a quantile.

In order to estimate the variance for $\hat Y_ p$, first the procedure estimates the variance of the estimated distribution function $\hat F(\hat Y_ p)$ by

\[  \hat V(\hat F(\hat Y_ p)) =\sum _{h=1}^ H \frac{n_ h(1-f_ h)}{n_ h-1} ~  \sum _{i=1}^{n_ h} {(d_{hi\cdot }-\bar{d}_{h\cdot \cdot })^2}  \]

where

$\displaystyle  d_{hi\cdot } $
$\displaystyle = $
$\displaystyle  \left( \sum _{j=1}^{m_{hi}}w_{hij}~ (I(y_{hij} \le \hat Y_ p) - \hat F(\hat Y_ p)) \right) / ~  w_{\cdot \cdot \cdot }  $
$\displaystyle \bar{d}_{h\cdot \cdot }  $
$\displaystyle = $
$\displaystyle  \left( \sum _{i=1}^{n_ h}d_{hi\cdot } \right) / ~  n_ h  $

Then $100(1-\alpha )$% confidence limits for $\hat F(\hat Y_ p)$ can be constructed by

\[  \left(\hat p_ L, \, \, \, \, \, \,  \hat p_ U\right)=\left(\hat F(\hat Y_ p)-t_{\mi {df},\, \, \alpha /2}\sqrt {\hat V(\hat F(\hat Y_ p))}, \, \, \, \, \, \, \, \,  \hat F(\hat Y_ p)+t_{\mi {df},\, \, \alpha /2}\sqrt {\hat V(\hat F(\hat Y_ p))}\right)  \]

where $t_{\mi {df},\, \, \alpha /2}$ is the $100(1-\alpha /2)$ percentile of the t distribution with df degrees of freedom, described in the section Degrees of Freedom.

When $(\hat p_ L, \hat p_ U)$ is out of the range of [0,1], the procedure does not compute the standard error.

The $\hat{p}_ L$th quantile is defined as

\[  \hat Y_{\hat p_ L}= \left\{  \begin{array}{ll} y_{(1)} &  \mbox{ if } \hat p_ L<\hat F(y_{(1)}) \\ y_{(k_ L)}+\displaystyle {\frac{\hat p_ L-\hat F(y_{(k_ L)})}{\hat F(y_{(k_ L+1)})-\hat F(y_{(k_ L)})}} (y_{(k_ L+1)}-y_{(k_ L)}) &  \mbox{ if } \hat F(y_{(k_ L)}) \le \hat p_ L < \hat F(y_{(k_ L+1)}) \\ y_{(d)} &  \mbox{ if } \hat p_ L=1 \end{array} \right.  \]

and the $\hat p_ U$th quantile is defined as

\[  \hat Y_{\hat p_ U}= \left\{  \begin{array}{ll} y_{(1)} &  \mbox{ if } \hat p_ U<\hat F(y_{(1)}) \\ y_{(k_ U)}+\displaystyle {\frac{\hat p_ U-\hat F(y_{(k_ U)})}{\hat F(y_{(k_ U+1)})-\hat F(y_{(k_ U)})}} (y_{(k_ U+1)}-y_{(k_ U)}) &  \mbox{ if } \hat F(y_{(k_ U)}) \le \hat p_ U < \hat F(y_{(k_ U+1)}) \\ y_{(d)} &  \mbox{ if } \hat p_ U=1 \end{array} \right.  \]

The standard error of $\hat Y_ p$ then is estimated by

\[  \mr {sd}(\hat Y_ p) = \frac{ \hat Y_{\hat p_ U} - \hat Y_{\hat p_ L} }{2t_{\mi {df},\, \, \alpha /2}}  \]

where $t_{\mi {df},\, \, \alpha /2}$ is the $100(1-\alpha /2)$ percentile of the t distribution with df degrees of freedom.

When you use the replication method, PROC SURVEYMEANS uses the usual variance estimates for a quantile as described in the section Replication Methods for Variance Estimation. However, you should proceed cautiously because this variance estimator can have poor properties (Dorfman and Valliant, 1993).

Confidence Limits

Symmetric $100(1-\alpha )$% confidence limits are computed as

\[  \left( \hat Y_ p - \mr {sd}(\hat Y_ p) ~ \cdot ~  t_{\mi {df},\, \, \alpha /2} , \, \, \, \, \, \,  \hat Y_ p + \mr {sd}(\hat Y_ p) ~ \cdot ~  t_{\mi {df},\, \, \alpha /2} \right)  \]

If you specify the NONSYMCL option in the SURVEYMEANS  statement when you use VARMETHOD=TAYLOR option, the procedure computes $100(1-\alpha )$% nonsymmetric confidence limits:

\[  \left( \hat Y_{\hat p_ L}, \, \, \, \, \,  \hat Y_{\hat p_ U} \right)  \]