The UNIVARIATE Procedure

Distributions for Probability and Q-Q Plots

You can use the PROBPLOT and QQPLOT statements to request probability and Q-Q plots that are based on the theoretical distributions summarized in Table 4.33.

Table 4.33: Distributions and Parameters

     

Parameters

Distribution

Density Function $p(x)$

Range

Location

Scale

Shape

Beta

$\frac{(x-\theta )^{\alpha -1}(\theta +\sigma -x)^{\beta -1}}{B(\alpha ,\beta )\sigma ^{(\alpha +\beta -1)}}$

$\theta < x <\theta +\sigma $

$\theta $

$\sigma $

$\alpha $, $\beta $

Exponential

$\frac{1}{\sigma }\exp \left(-\frac{x-\theta }{\sigma }\right)$

$x \geq \theta $

$\theta $

$\sigma $

 

Gamma

$\frac{1}{\sigma \Gamma (\alpha )} \left(\frac{x-\theta }{\sigma }\right)^{\alpha -1} \exp \left(-\frac{x-\theta }{\sigma }\right)$

$x>\theta $

$\theta $

$\sigma $

$\alpha $

Gumbel

$\frac{e^{-(x-\mu )/\sigma }}{\sigma } \exp \left( -e^{-(x-\mu )/\sigma }\right)$

all $x$

$\mu $

$\sigma $

 

Lognormal

$\frac{1}{\sigma \sqrt {2\pi }(x-\theta )} \exp \left(-\frac{(\log (x-\theta )-\zeta )^{2}}{2\sigma ^{2}}\right)$

$x>\theta $

$\theta $

$\zeta $

$\sigma $

(3-parameter)

         

Normal

$\frac{1}{\sigma \sqrt {2\pi }} \exp \left(-\frac{(x-\mu )^2}{2\sigma ^{2}}\right)$

all $x$

$\mu $

$\sigma $

 

Generalized

$\alpha \neq 0$

$\frac{1}{\sigma }{(1 - \alpha (x-\theta )/\sigma )}^{1/\alpha -1}$

$x > \theta $

$\theta $

$\sigma $

$\alpha $

Pareto

$\alpha = 0$

$\frac{1}{\sigma }\exp (-(x-\theta )/\sigma )$

       

Power Function

$\frac{\alpha }{\sigma }\left(\frac{x-\theta }{\sigma }\right)^{\alpha -1}$

$x > \theta $

$\theta $

$\sigma $

$\alpha $

Rayleigh

$\frac{x-\theta }{\sigma ^2}\exp (-(x-\theta )^2/(2\sigma ^2))$

$x \geq \theta $

$\theta $

$\sigma $

 

Weibull

$\frac{c}{\sigma }\left(\frac{x-\theta }{\sigma }\right)^{c-1} \exp \left(-\left(\frac{x-\theta }{\sigma }\right)^{c}\right)$

$x>\theta $

$\theta $

$\sigma $

$c$

(3-parameter)

         

Weibull

$\frac{c}{\sigma }\left(\frac{x-\theta _0}{\sigma }\right)^{c-1} \exp \left(-\left(\frac{x-\theta _0}{\sigma }\right)^{c}\right)$

$x>\theta _0$

$\theta _0$

$\sigma $

$c$

(2-parameter)

   

(known)

   


You can request these distributions with the BETA, EXPONENTIAL, GAMMA, PARETO, GUMBEL, LOGNORMAL, NORMAL, POWER, RAYLEIGH, WEIBULL, and WEIBULL2 options, respectively. If you do not specify a distribution option, a normal probability plot or a normal Q-Q plot is created.

The following sections provide details for constructing Q-Q plots that are based on these distributions. Probability plots are constructed similarly except that the horizontal axis is scaled in percentile units.

Beta Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $B_{ \alpha \beta }^{-1} \left( \frac{ i - 0.375 }{ n + 0.25 } \right)$, where $B_{ \alpha \beta }^{-1} ( \cdot )$ is the inverse normalized incomplete beta function, $n$ is the number of nonmissing observations, and $\alpha $ and $\beta $ are the shape parameters of the beta distribution. In a probability plot, the horizontal axis is scaled in percentile units.

The pattern on the plot for ALPHA=$\alpha $ and BETA=$\beta $ tends to be linear with intercept $\theta $ and slope $\sigma $ if the data are beta distributed with the specific density function

\[  p(x)=\left\{ \begin{array}{ll} \frac{(x - \theta )^{\alpha - 1} (\theta + \sigma - x)^{\beta - 1} }{B(\alpha ,\beta ) \sigma ^{(\alpha + \beta - 1)} } &  \mbox{for $\theta < x < \theta + \sigma $} \\ 0 &  \mbox{for $x \leq \theta $ or $x \geq \theta + \sigma $ } \end{array} \right.  \]

where $B(\alpha ,\beta ) = \frac{\Gamma (\alpha )\Gamma (\beta )}{\Gamma (\alpha +\beta )} $ and

  • $\theta = $ lower threshold parameter

  • $\sigma = $ scale parameter $(\sigma >0)$

  • $\alpha = $ first shape parameter $(\alpha >0)$

  • $\beta = $ second shape parameter $(\beta >0)$

Exponential Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $-\log \!  \left(1-\frac{i-0.375}{n+0.25} \right)$, where $n$ is the number of nonmissing observations. In a probability plot, the horizontal axis is scaled in percentile units.

The pattern on the plot tends to be linear with intercept $\theta $ and slope $\sigma $ if the data are exponentially distributed with the specific density function

\[  p( x )= \left\{  \begin{array}{ll} \frac{ 1 }{ \sigma } \exp \left( - \frac{ x - \theta }{ \sigma } \right) &  \mbox{ for $ x \geq \theta $ } \\ 0 &  \mbox{ for $ x < \theta $ } \end{array} \right.  \]

where $\theta $ is a threshold parameter, and $\sigma $ is a positive scale parameter.

Gamma Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $G_{\alpha }^{-1} \left( \frac{ i - 0.375 }{ n + 0.25 } \right)$, where $G_{\alpha }^{-1}(\cdot )$ is the inverse normalized incomplete gamma function, $n$ is the number of nonmissing observations, and $\alpha $ is the shape parameter of the gamma distribution. In a probability plot, the horizontal axis is scaled in percentile units.

The pattern on the plot for ALPHA=$\alpha $ tends to be linear with intercept $\theta $ and slope $\sigma $ if the data are gamma distributed with the specific density function

\[  p(x)= \left\{  \begin{array}{ll} \frac{1}{ \sigma \Gamma (\alpha ) } \left( \frac{ x - \theta }{ \sigma } \right) ^{\alpha - 1} \exp \left( - \frac{ x - \theta }{ \sigma } \right) &  \mbox{ for $ x > \theta $ } \\ 0 &  \mbox{ for $ x \leq \theta $ } \end{array} \right.  \]

where

  • $\theta = $ threshold parameter

  • $\sigma = $ scale parameter $(\sigma >0)$

  • $\alpha = $ shape parameter $(\alpha >0)$

Gumbel Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $-\log \left( -\log \left( \frac{ i - 0.375 }{ n + 0.25 } \right) \right)$, where $n$ is the number of nonmissing observations. In a probability plot, the horizontal axis is scaled in percentile units.

The pattern on the plot tends to be linear with intercept $\mu $ and slope $\sigma $ if the data are Gumbel distributed with the specific density function

\[  p(x) = \frac{e^{-(x-\mu )/\sigma }}{\sigma } \exp \left( -e^{-(x-\mu )/\sigma }\right)  \]
  • $\mu = $ location parameter

  • $\sigma = $ scale parameter $(\sigma >0)$

Lognormal Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $\exp \!  \left( \sigma \Phi ^{-1} \!  \left( \frac{ i - 0.375 }{ n + 0.25 } \right) \right)$, where $\Phi ^{-1}(\cdot )$ is the inverse cumulative standard normal distribution, $n$ is the number of nonmissing observations, and $\sigma $ is the shape parameter of the lognormal distribution. In a probability plot, the horizontal axis is scaled in percentile units.

The pattern on the plot for SIGMA=$\sigma $ tends to be linear with intercept $\theta $ and slope $\exp (\zeta )$ if the data are lognormally distributed with the specific density function

\[  p(x) = \left\{  \begin{array}{ll} \frac{1}{ \sigma \sqrt {2 \pi }(x - \theta ) } \exp \left(-\frac{ (\log (x - \theta )- \zeta )^{2} }{2 \sigma ^{2} } \right) &  \mbox{for $x > \theta $} \\ 0 &  \mbox{for $x \leq \theta $} \end{array} \right.  \]

where

  • $\theta = $ threshold parameter

  • $\zeta = $ scale parameter

  • $\sigma = $ shape parameter $(\sigma > 0)$

See Example 4.26 and Example 4.33.

Normal Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $\Phi ^{-1} \! \!  \left( \frac{i- 0.375}{n+ 0.25} \right)$, where $\Phi ^{-1}(\cdot )$ is the inverse cumulative standard normal distribution and $n$ is the number of nonmissing observations. In a probability plot, the horizontal axis is scaled in percentile units.

The point pattern on the plot tends to be linear with intercept $\mu $ and slope $\sigma $ if the data are normally distributed with the specific density function

\[  p(x) = \begin{array}{ll} \frac{1}{\sigma \sqrt {2 \pi } } \exp \left( -\frac{(x - \mu )^2}{2 \sigma ^{2}} \right) &  \mbox{for all $x$} \\ \end{array}  \]

where $\mu $ is the mean and $\sigma $ is the standard deviation ($\sigma > 0$).

Generalized Pareto Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $( 1 - ( 1 - \frac{ i - 0.375 }{ n + 0.25 } )^{\alpha } ) / \alpha $ ($\alpha \neq 0$) or $-\log ( 1 - \frac{ i - 0.375 }{ n + 0.25 } )$ ($\alpha = 0$), where $n$ is the number of nonmissing observations and $\alpha $ is the shape parameter of the generalized Pareto distribution. The horizontal axis is scaled in percentile units.

The point pattern on the plot for ALPHA=$\alpha $ tends to be linear with intercept $\theta $ and slope $\sigma $ if the data are generalized Pareto distributed with the specific density function

\[  p(x) = \left\{  \begin{array}{ll} \frac{1}{\sigma }{(1 - \alpha (x-\theta )/\sigma )}^{1/\alpha -1} &  \mbox{if $ \alpha \neq 0$} \\ \frac{1}{\sigma }\exp (-(x-\theta )/\sigma ) &  \mbox{if $ \alpha = 0$} \end{array} \right.  \]

where $\theta = $ threshold parameter $\sigma = $ scale parameter $(\sigma >0)$ $\alpha = $ shape parameter $(\alpha >0)$

Power Function Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $B_{ \alpha (1) }^{-1} \left( \frac{ i - 0.375 }{ n + 0.25 } \right)$, where $B_{ \alpha (1) }^{-1} ( \cdot )$ is the inverse normalized incomplete beta function, $n$ is the number of nonmissing observations, $\alpha $ is one shape parameter of the beta distribution, and the second shape parameter, $\beta = 1$. The horizontal axis is scaled in percentile units.

The point pattern on the plot for ALPHA=$\alpha $ tends to be linear with intercept $\theta $ and slope $\sigma $ if the data are power function distributed with the specific density function

\[  p(x) = \left\{  \begin{array}{ll} \frac{\alpha }{\sigma }\left(\frac{x-\theta }{\sigma }\right)^{\alpha -1} &  \mbox{for $\theta < x < \theta + \sigma $} \\ 0 &  \mbox{for $x \leq \theta $ or $x \geq \theta + \sigma $ } \end{array} \right.  \]

where

  • $\theta = $ threshold parameter

  • $\sigma = $ shape parameter $(\sigma > 0)$

  • $\alpha = $ shape parameter $(\alpha > 0)$

Rayleigh Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $\sqrt {-2 \log \left(1-\frac{i-0.375}{n+0.25} \right)}$, where $n$ is the number of nonmissing observations. The horizontal axis is scaled in percentile units.

The point pattern on the plot tends to be linear with intercept $\theta $ and slope $\sigma $ if the data are Rayleigh distributed with the specific density function

\[  p(x) = \left\{  \begin{array}{ll} \frac{x-\theta }{\sigma ^2}\exp (-(x-\theta )^2/(2\sigma ^2)) &  \mbox{for $x \geq \theta $} \\ 0 &  \mbox{for $x <\theta $} \end{array} \right.  \]

where $\theta $ is a threshold parameter, and $\sigma $ is a positive scale parameter.

Three-Parameter Weibull Distribution

To create the plot, the observations are ordered from smallest to largest, and the $i$th ordered observation is plotted against the quantile $\left( -\log \!  \left(1-\frac{i-0.375}{n+0.25} \right) \right)^{\frac{1}{c}}$, where $n$ is the number of nonmissing observations, and $c$ is the Weibull distribution shape parameter. In a probability plot, the horizontal axis is scaled in percentile units.

The pattern on the plot for C=$c$ tends to be linear with intercept $\theta $ and slope $\sigma $ if the data are Weibull distributed with the specific density function

\[  p(x)= \left\{  \begin{array}{ll} \frac{c}{\sigma } \left( \frac{x - \theta }{\sigma } \right)^{c - 1} \exp \left( - \left( \frac{x - \theta }{\sigma } \right)^{c} \right) &  \mbox{ for $x > \theta $ } \\ 0 &  \mbox{ for $x \leq \theta $ } \end{array} \right.  \]

where

  • $\theta = $ threshold parameter

  • $\sigma = $ scale parameter $(\sigma >0)$

  • $c = $ shape parameter $( c > 0 )$

See Example 4.34.

Two-Parameter Weibull Distribution

To create the plot, the observations are ordered from smallest to largest, and the log of the shifted $i$th ordered observation $x_{(i)}$, denoted by $\log (x_{(i)} - \theta _0 )$, is plotted against the quantile $\log \!  \left(-\log \!  \left(1-\frac{i-0.375}{n+0.25}\right)\right)$, where $n$ is the number of nonmissing observations. In a probability plot, the horizontal axis is scaled in percentile units.

Unlike the three-parameter Weibull quantile, the preceding expression is free of distribution parameters. Consequently, the C= shape parameter is not mandatory with the WEIBULL2 distribution option.

The pattern on the plot for THETA=$\theta _0$ tends to be linear with intercept $\log (\sigma )$ and slope $\frac{1}{c}$ if the data are Weibull distributed with the specific density function

\[  p(x) = \left\{  \begin{array}{ll} \frac{c}{\sigma } \left( \frac{x - \theta _0}{\sigma } \right)^{c - 1} \exp \left( - \left( \frac{x - \theta _0}{\sigma } \right)^{c} \right) &  \mbox{ for $x > \theta _0$ } \\ 0 &  \mbox{ for $x \leq \theta _0$ } \end{array} \right.  \]

where

  • $\theta _0 = $ known lower threshold

  • $\sigma = $ scale parameter $(\sigma >0)$

  • $c = $ shape parameter $(c >0)$

See Example 4.34.