The CAPABILITY procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th percentiles (quantiles), as well as the minimum and maximum of each analysis variable. To compute percentiles other than these default percentiles, use the PCTLPTS= and PCTLPRE= options in the OUTPUT statement.
You can specify one of five definitions for computing the percentiles with the PCTLDEF= option. Let n be the number of nonmissing values for a variable, and let represent the ordered values of the variable. Let the tth percentile be y, set , and let
where j is the integer part of , and g is the fractional part of . Then the PCTLDEF= option defines the tth percentile, y, as described in the following table:
PCTLDEF= |
Description |
Formula |
---|---|---|
1 |
weighted average at |
|
where is taken to be |
||
2 |
observation numbered closest to |
|
where i is the integer part of |
||
3 |
empirical distribution function |
|
4 |
weighted average aimed |
|
at |
where is taken to be |
|
5 |
empirical distribution function with averaging |
|
When you use a WEIGHT statement, the percentiles are computed differently. The 100pth weighted percentile y is computed from the empirical distribution function with averaging
where is the weight associated with , and where is the sum of the weights.
Note that the PCTLDEF= option is not applicable when a WEIGHT statement is used. However, in this case, if all the weights are identical, the weighted percentiles are the same as the percentiles that would be computed without a WEIGHT statement and with PCTLDEF=5.
You can use the CIPCTLNORMAL option to request confidence limits for percentiles which assume the data are normally distributed. These limits are described in Section 4.4.1 of Hahn and Meeker (1991). When , the two-sided % confidence limits for the -th percentile are
|
|
|
|
|
|
where n is the sample size. When , the two-sided % confidence limits for the -th percentile are
|
|
|
|
|
|
One-sided % confidence bounds are computed by replacing by in the appropriate preceding equation. The factor is related to the noncentral t distribution and is described in Owen and Hua (1977) and Odeh and Owen (1980).
You can use the CIPCTLDF option to request confidence limits for percentiles which are distribution free (in particular, it is not necessary to assume that the data are normally distributed). These limits are described in Section 5.2 of Hahn and Meeker (1991). The two-sided % confidence limits for the -th percentile are
|
|
|
|
|
|
where is the jth order statistic when the data values are arranged in increasing order:
The lower rank l and upper rank u are integers that are symmetric (or nearly symmetric) around where is the integer part of , and where n is the sample size. Furthermore, l and u are chosen so that and are as close to as possible while satisfying the coverage probability requirement
where is the cumulative binomial probability
In some cases, the coverage requirement cannot be met, particularly when n is small and p is near 0 or 1. To relax the requirement of symmetry, you can specify CIPCTLDF( TYPE = ASYMMETRIC ). This option requests symmetric limits when the coverage requirement can be met, and asymmetric limits otherwise.
If you specify CIPCTLDF( TYPE = LOWER ), a one-sided % lower confidence bound is computed as , where l is the largest integer that satisfies the inequality
with . Likewise, if you specify CIPCTLDF( TYPE = UPPER ), a one-sided % lower confidence bound is computed as , where l is the largest integer that satisfies the inequality
where .
Note that confidence limits for percentiles are not computed when a WEIGHT statement is specified.