Let Y be the variable of interest in a complex survey. Denote
as the cumulative distribution for Y. For
, the pth quantile of the population cumulative distribution function is
Let
be the observed values for variable Y associated with sampling weights, where
are the stratum index, cluster index, and member index, respectively, as shown in the section Definitions and Notation. Let
denote the sample order statistics for variable Y.
An estimate of quantile
is
![\[ \hat Y_ p= \left\{ \begin{array}{ll} y_{(1)} & \mbox{ if } p<\hat F(y_{(1)}) \\ y_{(k)}+\displaystyle {\frac{p-\hat F(y_{(k)})}{\hat F(y_{(k+1)})-\hat F(y_{(k)})}} (y_{(k+1)}-y_{(k)}) & \mbox{ if } \hat F(y_{(k)}) \le p < \hat F(y_{(k+1)}) \\ y_{(d)} & \mbox{ if } p=1 \end{array} \right. \]](images/statug_surveymeans0145.png)
where
is the estimated cumulative distribution for Y:
and
is the indicator function.
When you use VARMETHOD=TAYLOR, or by default if you do not specify the VARMETHOD= option, PROC SURVEYMEANS uses Woodruff’s method (Dorfman and Valliant, 1993; Särndal, Swensson, and Wretman, 1992; Francisco and Fuller, 1991) to estimate the variances of quantiles. This method first constructs a confidence interval on a quantile. Then it uses the width of the confidence interval to estimate the standard error of a quantile.
In order to estimate the variance for
, first the procedure estimates the variance of the estimated distribution function
by
where
|
|
|
|
|
|
|
|
Then
% confidence limits for
can be constructed by
where
is the
percentile of the t distribution with df degrees of freedom, described in the section Degrees of Freedom.
When
is out of the range of [0,1], the procedure does not compute the standard error.
The
th quantile is defined as
![\[ \hat Y_{\hat p_ L}= \left\{ \begin{array}{ll} y_{(1)} & \mbox{ if } \hat p_ L<\hat F(y_{(1)}) \\ y_{(k_ L)}+\displaystyle {\frac{\hat p_ L-\hat F(y_{(k_ L)})}{\hat F(y_{(k_ L+1)})-\hat F(y_{(k_ L)})}} (y_{(k_ L+1)}-y_{(k_ L)}) & \mbox{ if } \hat F(y_{(k_ L)}) \le \hat p_ L < \hat F(y_{(k_ L+1)}) \\ y_{(d)} & \mbox{ if } \hat p_ L=1 \end{array} \right. \]](images/statug_surveymeans0160.png)
and the
th quantile is defined as
![\[ \hat Y_{\hat p_ U}= \left\{ \begin{array}{ll} y_{(1)} & \mbox{ if } \hat p_ U<\hat F(y_{(1)}) \\ y_{(k_ U)}+\displaystyle {\frac{\hat p_ U-\hat F(y_{(k_ U)})}{\hat F(y_{(k_ U+1)})-\hat F(y_{(k_ U)})}} (y_{(k_ U+1)}-y_{(k_ U)}) & \mbox{ if } \hat F(y_{(k_ U)}) \le \hat p_ U < \hat F(y_{(k_ U+1)}) \\ y_{(d)} & \mbox{ if } \hat p_ U=1 \end{array} \right. \]](images/statug_surveymeans0162.png)
The standard error of
then is estimated by
where
is the
percentile of the t distribution with df degrees of freedom.
When you use the replication method, PROC SURVEYMEANS uses the usual variance estimates for a quantile as described in the section Replication Methods for Variance Estimation. However, you should proceed cautiously because this variance estimator can have poor properties (Dorfman and Valliant, 1993).
Symmetric
% confidence limits are computed as
If you specify the NONSYMCL option in the SURVEYMEANS statement when you use VARMETHOD=TAYLOR option, the procedure computes
% nonsymmetric confidence limits: