The BOXPLOT Procedure

Percentile Definitions

You can use the PCTLDEF= option to specify one of five definitions for computing quantile statistics (percentiles). Suppose that n is the number of nonmissing values for a variable and that $x_{1}, x_{2},\ldots ,x_{n}$ represent the ordered values of the analysis variable. For the tth percentile, set $p =t/100$.

For the following definitions numbered 1, 2, 3, and 5, express $np$ as

\[  np = j + g \]

where j is the integer part of $np$, and g is the fractional part of $np$. For definition 4, let

\[ (n+1)p=j+g \]

The tth percentile (call it y) can be defined as follows:

PCTLDEF=1

weighted average at $x_{np}$

\[  y = (1 - g)x_ j + gx_{j+1}  \]

where $x_0$ is taken to be $x_1$.

PCTLDEF=2

observation numbered closest to $np$

\[ y = x_ i \]

where i is the integer part of $np + 1/2$ if $g \neq 1/2$. If $g=1/2$, then $y=x_ j$ if j is even, or $y=x_{j+1}$ if j is odd.

PCTLDEF=3

empirical distribution function

\[  y = x_ j ~  \mbox{if}~  g = 0  \]
\[  y=x_{j+1}~  \mbox{if}~  g > 0  \]
PCTLDEF=4

weighted average aimed at $x_{p(n+1)}$

\[  y=(1 - g)x_ j + gx_{j+1}  \]

where $x_{n+1}$ is taken to be $x_ n$.

PCTLDEF=5

empirical distribution function with averaging

\[  y = (x_ j + x_{j+1})/2 ~  \mbox{if}~  g = 0  \]
\[  y = x_{j+1}~  \mbox{if}~  g > 0  \]