# The SURVEYMEANS Procedure

#### Quantiles

Let Y be the variable of interest in a complex survey. Denote as the cumulative distribution function of Y. For , the pth quantile of the population cumulative distribution function is

##### Estimate of Quantile

Let be the observed values for variable Y that are associated with sampling weights, where are the stratum index, cluster index, and member index, respectively, as shown in the section Definitions and Notation. Let denote the sample order statistics for variable Y.

An estimate of quantile is

where is the estimated cumulative distribution for Y,

and is the indicator function.

##### Standard Error

When you specify VARMETHOD=TAYLOR, or by default if you do not specify the VARMETHOD= option, PROC SURVEYMEANS uses Woodruff’s method (Dorfman and Valliant, 1993; Särndal, Swensson, and Wretman, 1992; Francisco and Fuller, 1991) to estimate the variances of quantiles. This method first constructs a confidence interval on a quantile. Then it uses the width of the confidence interval to estimate the standard error of a quantile.

In order to estimate the variance of , PROC SURVEYMEANS first estimates the variance of the estimated distribution function by

where

Then % confidence limits for can be constructed by

where is the percentile of the t distribution with df degrees of freedom, described in the section Degrees of Freedom.

When is out of the range of [0,1], the procedure does not compute the standard error of .

The th quantile is defined as

and the th quantile is defined as

The standard error of is then estimated by

where is the percentile of the t distribution with df degrees of freedom.

When you use the replication method, PROC SURVEYMEANS uses the usual variance estimates for a quantile as described in the section Replication Methods for Variance Estimation. However, you should proceed cautiously, because this variance estimator can have poor properties (Dorfman and Valliant, 1993).

##### Confidence Limits

Symmetric % confidence limits are computed as

If you specify the NONSYMCL option in the PROC SURVEYMEANS  statement when you use the VARMETHOD=TAYLOR option, the procedure computes % nonsymmetric confidence limits:

##### Quantile Estimation with Poststratification

When you specify a POSTSTRATA statement, the quantile estimation and its variance estimation incorporate poststratification. For more information about poststratification, see the section Poststratification.

For a selected sample, let be the poststratum index; let be the population totals for each corresponding poststratum, and let be the indicator variable for the poststratum r that is defined by

Denote the total sum of original weights in the sample for each poststratum as

Assume that the observation (h, i, j) belongs to the rth poststratum. Then the poststratification weight for the observation (h, i, j) is

Then the estimated cumulative distribution function of Y, and the estimated pth quantile estimation can be computed as in the section Estimate of Quantile by replacing the original weights, , with the poststratification weights, .

When you specify VARMETHOD=TAYLOR (or by default), the variance of is estimated as in the section Standard Error, except that the variance of the estimated distribution function is computed as follows.

For each poststratum , define

where is the indicator function.

Assume that the observation (h, i, j) belongs to the rth poststratum. Let

PROC SURVEYMEANS estimates the variance of the estimated distribution function with poststratification by

where

##### Domain Quantile

Let Y be the variable of interest in a complex survey, and let a subpopulation of interest be domain D. Denote as the cumulative distribution function of Y in domain D. For , the pth quantile of the population cumulative distribution function is

Let be the corresponding indicator variable:

Assume that there are a total of d observations among the n observations in the entire sample that belong to domain D. Let denote the order statistics of variable Y for these d observations that fall in domain D.

The cumulative distribution function of Y in domain D is estimated by

and is the indicator function. Then the estimated quantile in domain D is

In order to estimate the variance for , PROC SURVEYMEANS first estimates the variance of the estimated distribution function in domain D. When you specify VARMETHOD=TAYLOR (or by default), the variance of is estimated by

where

Then % confidence limits for can be constructed by , where

and is the percentile of the t distribution with df degrees of freedom, described in the section Degrees of Freedom. When is out of the range of [0,1], PROC SURVEYMEANS does not compute the standard error of .

The th quantile is then estimated as

The th quantile is then estimated as

The standard error of is then estimated by

where is the percentile of the t distribution with df degrees of freedom.

Symmetric % confidence limits for are computed as

If you specify the NONSYMCL option in the PROC SURVEYMEANS statement, the procedure displays % nonsymmetric confidence limits as

##### Domain Quantile Estimation with Poststratification

When you specify both a POSTSTRATA statement and a DOMAIN statement, the domain quantile estimation and its variance estimation incorporate poststratification. For more information about poststratification, see the section Poststratification.

For a selected sample, let be the poststratum index, let be the population totals for each corresponding poststratum, and let be the indicator variable for the poststratum r:

The poststratification weights, , are defined as in the section Quantile Estimation with Poststratification.

For domain D, let be the corresponding indicator variable:

With poststratification, for variable Y, the estimated cumulative distribution in domain D, , and its pth quantile estimation, , can be computed as in the section Domain Quantile by replacing the original weights, , with the poststratification weights, . However, the variance of , which is described in the section Domain Quantile, is computed as follows when you specify the VARMETHOD=TAYLOR option (or by default).

Define

Assume that the observation (h, i, j) belongs to the rth poststratum. Then the variance of is estimated by