Dictionary of Options

The following entries provide detailed descriptions of the options specific to the PPPLOT statement. The note Line Printer identifies options that apply only to line printer plots. See Dictionary of Common Options: CAPABILITY Procedure for detailed descriptions of options common to all the plot statements.

ALPHA=value

specifies the shape parameter $\alpha $ $(\alpha >0)$ for P-P plots requested with the BETA, GAMMA, PARETO, and POWER options. For examples, see the entries for the distribution options.

BETA<(beta-options)>

creates a beta P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical beta cdf value

$ B_{\alpha \beta }\left(\frac{x_{(i)}-\theta }{\sigma }\right) = \int _{\theta }^{x_{(i)}} \frac{(t-\theta )^{\alpha -1}(\theta +\sigma -t)^{\beta -1} }{B(\alpha ,\beta ) \sigma ^{(\alpha +\beta -1)} } dt $ where $B_{\alpha \beta }(\cdot )$ is the normalized incomplete beta function, $B(\alpha ,\beta ) = \frac{\Gamma (\alpha )\Gamma (\beta )}{\Gamma (\alpha +\beta )} $ , and $\theta = $ lower threshold parameter $\sigma = $ scale parameter $(\sigma >0)$ $\alpha = $ first shape parameter $(\alpha >0)$ $\beta = $ second shape parameter $(\beta >0)$

You can specify $\alpha $, $\beta $, $\sigma $, and $\theta $ with the ALPHA=, BETA=, SIGMA=, and THETA= beta-options, as illustrated in the following example:

proc capability data=measures;
   ppplot width / beta(theta=1 sigma=2 alpha=3 beta=4);
run;

If you do not specify values for these parameters, then by default, $\theta =0$, $\sigma =1$, and maximum likelihood estimates are calculated for $\alpha $ and $\beta $.

IMPORTANT: If the default unit interval (0,1) does not adequately describe the range of your data, then you should specify THETA=$\theta $ and SIGMA=$\sigma $ so that your data fall in the interval $(\theta , \theta +\sigma )$.

If the data are beta distributed with parameters $\alpha $, $\beta $, $\sigma $, and $\theta $, then the points on the plot for ALPHA=$\alpha $, BETA=$\beta $, SIGMA=$\sigma $, and THETA=$\theta $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified beta distribution is a good fit. You can specify the SCALE= option as an alias for the SIGMA= option and the THRESHOLD= option as an alias for the THETA= option.

BETA=value

specifies the shape parameter $\beta $ $(\beta >0)$ for P-P plots requested with the BETA distribution option. See the preceding entry for the BETA distribution option for an example.

C=value

specifies the shape parameter c (c > 0) for P-P plots requested with the WEIBULL option. See the entry for the WEIBULL option for examples.

EXPONENTIAL<(exponential-options)>
EXP<(exponential-options)>

creates an exponential P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical exponential cdf value

$ F(x_{(i)}) = 1-\exp \left(-\frac{x_{(i)}-\theta }{\sigma }\right) $ where $\theta = $ threshold parameter $\sigma = $ scale parameter $(\sigma >0)$

You can specify $\sigma $ and $\theta $ with the SIGMA= and THETA= exponential-options, as illustrated in the following example:

proc capability data=measures;
   ppplot width / exponential(theta=1 sigma=2);
run;

If you do not specify values for these parameters, then by default, $\theta =0$ and a maximum likelihood estimate is calculated for $\sigma $.

IMPORTANT: Your data must be greater than or equal to the lower threshold $\theta $. If the default $\theta =0$ is not an adequate lower bound for your data, specify $\theta $ with the THETA= option.

If the data are exponentially distributed with parameters $\sigma $ and $\theta $, the points on the plot for SIGMA=$\sigma $ and THETA=$\theta $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified exponential distribution is a good fit. You can specify the SCALE= option as an alias for the SIGMA= option and the THRESHOLD= option as an alias for the THETA= option.

GAMMA<(gamma-options)>

creates a gamma P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical gamma cdf value

$ G_{\alpha }\left(\frac{x_{(i)}-\theta }{\sigma }\right) = \int _{\theta }^{x_{(i)}} \frac{1}{\sigma \Gamma (\alpha )} \left(\frac{t-\theta }{\sigma }\right)^{\alpha -1} \exp \left(-\frac{t-\theta }{\sigma }\right) dt $ where $G_{\alpha }(\cdot )$ is the normalized incomplete gamma function, and $\theta = $ threshold parameter $\sigma = $ scale parameter $(\sigma >0)$ $\alpha = $ shape parameter $(\alpha >0)$

You can specify $\alpha $, $\sigma $, and $\theta $ with the ALPHA=, SIGMA=, and THETA= gamma-options, as illustrated in the following example:

proc capability data=measures;
   ppplot width / gamma(alpha=1 sigma=2 theta=3);
run;

If you do not specify values for these parameters, then by default, $\theta =0$ and maximum likelihood estimates are calculated for $\alpha $ and $\sigma $.

IMPORTANT: Your data must be greater than or equal to the lower threshold $\theta $. If the default $\theta =0$ is not an adequate lower bound for your data, specify $\theta $ with the THETA= option.

If the data are gamma distributed with parameters $\alpha $, $\sigma $, and $\theta $, the points on the plot for ALPHA=$\alpha $, SIGMA=$\sigma $, and THETA=$\theta $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified gamma distribution is a good fit. You can specify the SHAPE= option as an alias for the ALPHA= option, the SCALE= option as an alias for the SIGMA= option, and the THRESHOLD= option as an alias for the THETA= option.

GUMBEL<(Gumbel-options)>

creates a Gumbel P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical Gumbel cdf value

\[  F(x_{(i)}) = \exp \left( -e^{-(x_{(i)} - \mu )/\sigma } \right)  \]

where $\mu =$ location parameter $\sigma =$ scale parameter $(\sigma >0)$

You can specify $\mu $ and $\sigma $ with the MU= and SIGMA= Gumbel-options. By default, maximum likelihood estimates are computed for $\mu $ and $\sigma $.

If the data are Gumbel distributed with parameters $\mu $ and $\sigma $, the points on the plot for MU=$\mu $ and SIGMA=$\sigma $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified Gumbel distribution is a good fit.

IGAUSS<(iGauss-options)>

creates an inverse Gaussian P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical inverse Gaussian cdf value

\[  F(x_{(i)}) = \Phi \left\{  \sqrt {\frac{\lambda }{x_{(i)}}} \left( \frac{x_{(i)}}{\mu } - 1 \right) \right\}  + e^{2\lambda /\mu } \Phi \left\{  -\sqrt {\frac{\lambda }{x_{(i)}}} \left( \frac{x_{(i)}}{\mu } + 1 \right) \right\}   \]

where $\Phi (\cdot )$ is the standard normal cumulative distribution function, and $\mu =$ mean parameter $(\mu > 0)$ $\lambda =$ shape parameter $(\lambda >0)$

You can specify known values for $\mu $ and $\lambda $ with the MU= and LAMBDA= iGauss-options. By default, the sample mean is calculated for $\mu $ and a maximum likelihood estimate is computed for and $\lambda $.

If the data are inverse Gaussian distributed with parameters $\mu $ and $\lambda $, the points on the plot for MU=$\mu $ and LAMBDA=$\lambda $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified inverse Gaussian distribution is a good fit.

LAMBDA=value

specifies the shape parameter $\lambda $ ($\lambda > 0$) for P-P plots requested with the IGAUSS option. Enclose the LAMBDA= option in parentheses after the IGAUSS distribution keyword. If you do not specify a value for $\lambda $, the procedure calculates a maximum likelihood estimate.

LOGNORMAL<(lognormal-options)>
LNORM<(lognormal-options)>

creates a lognormal P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical lognormal cdf value

$ \Phi \left(\frac{\log (x_{(i)}-\theta )-\zeta }{\sigma }\right) $ where $\Phi (\cdot )$ is the cumulative standard normal distribution function, and $\theta = $ threshold parameter $\zeta = $ scale parameter $\sigma = $ shape parameter $(\sigma > 0)$

You can specify $\theta $, $\zeta $, and $\sigma $ with the THETA=, ZETA=, and SIGMA= lognormal-options, as illustrated in the following example:

proc capability data=measures;
   ppplot width / lognormal(theta=1 zeta=2);
run;

If you do not specify values for these parameters, then by default, $\theta =0$ and maximum likelihood estimates are calculated for $\sigma $ and $\zeta $.

IMPORTANT: Your data must be greater than the lower threshold $\theta $. If the default $\theta =0$ is not an adequate lower bound for your data, specify $\theta $ with the THETA= option.

If the data are lognormally distributed with parameters $\sigma $, $\theta $, and $\zeta $, the points on the plot for SIGMA=$\sigma $, THETA=$\theta $, and ZETA=$\zeta $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified lognormal distribution is a good fit. You can specify the SHAPE= option as an alias for the SIGMA= option, the SCALE= option as an alias for the ZETA= option, and the THRESHOLD= option as an alias for the THETA= option.

MU=value

specifies the parameter $\mu $ for a P-P plot requested with the GUMBEL, IGAUSS, and NORMAL options. For examples, see Figure 5.31, or Figure 5.32 and Figure 5.33. For the normal and inverse Gaussian distributions, the default value of $\mu $ is the sample mean. If you do not specify a value for $\mu $ for the Gumbel distribution, the procedure calculates a maximum likelihood estimate.

NOLINE

suppresses the diagonal reference line.

NOOBSLEGEND
NOOBSL

Line Printersuppresses the legend that indicates the number of hidden observations.

NORMAL<(normal-options)>
NORM<(normal-options )>

creates a normal P-P plot. By default, if you do not specify a distribution option, the procedure displays a normal P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical normal cdf value

$ \Phi \left(\frac{x_{(i)}-\mu }{\sigma }\right) = \int _{-\infty }^{x_{(i)}} \frac{1}{\sigma \sqrt {2 \pi } } \exp \left( -\frac{(t - \mu )^2}{2 \sigma ^{2}} \right) dt $ where $\Phi (\cdot )$ is the cumulative standard normal distribution function, and $\mu = $ location parameter or mean $\sigma = $ scale parameter or standard deviation $(\sigma > 0)$

You can specify $\mu $ and $\sigma $ with the MU= and SIGMA= normal-options, as illustrated in the following example:

proc capability data=measures;
   ppplot width / normal(mu=1 sigma=2);
run;

By default, the sample mean and sample standard deviation are used for $\mu $ and $\sigma $.

If the data are normally distributed with parameters $\mu $ and $\sigma $, the points on the plot for MU=$\mu $ and SIGMA=$\sigma $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified normal distribution is a good fit. For an example, see Figure 5.31.

PARETO<(Pareto-options)>

creates a generalized Pareto P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical generalized Pareto cdf value

\[  F(x_{(i)}) = 1 - { \left( 1 - \frac{\alpha (x_{(i)} - \theta )}{\sigma } \right) }^\frac {1}{\alpha }  \]

where $\theta =$ threshold parameter $\sigma =$ scale parameter $(\sigma >0)$ $\alpha =$ shape parameter

The parameter $\theta $ for the generalized Pareto distribution must be less than the minimum data value. You can specify $\theta $ with the THETA= Pareto-option. The default value for $\theta $ is 0. In addition, the generalized Pareto distribution has a shape parameter $\alpha $ and a scale parameter $\sigma $. You can specify these parameters with the ALPHA= and SIGMA= Pareto-options. By default, maximum likelihood estimates are computed for $\alpha $ and $\sigma $.

If the data are generalized Pareto distributed with parameters $\theta $, $\sigma $, and $\alpha $, the points on the plot for THETA=$\theta $, SIGMA=$\sigma $, and ALPHA=$\alpha $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified generalized Pareto distribution is a good fit.

POWER<(power-options)>

creates a power function P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical power function cdf value

\[  F(x_{(i)}) = {\left( \frac{x_{(i)} - \theta }{\sigma } \right)}^{\alpha }  \]

where $\theta =$ lower threshold parameter (lower endpoint) $\sigma =$ scale parameter $(\sigma > 0)$ $\alpha =$ shape parameter $(\alpha > 0)$

The power function distribution is bounded below by the parameter $\theta $ and above by the value $\theta + \sigma $. You can specify $\theta $ and $\sigma $ by using the THETA= and SIGMA= power-options. The default values for $\theta $ and $\sigma $ are 0 and 1, respectively.

You can specify a value for the shape parameter, $\alpha $, with the ALPHA= power-option. If you do not specify a value for $\alpha $, the procedure calculates a maximum likelihood estimate.

The power function distribution is a special case of the beta distribution with its second shape parameter, $\beta = 1$.

If the data are power function distributed with parameters $\theta $, $\sigma $, and $\alpha $, the points on the plot for THETA=$\theta $, SIGMA=$\sigma $, and ALPHA=$\alpha $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified power function distribution is a good fit.

PPSYMBOL='character'

Line Printerspecifies the character used to plot the points in a line printer plot. The default is the plus sign (+).

RAYLEIGH<(Rayleigh-options)>

creates a Rayleigh P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical Rayleigh cdf value

\[  F(x_{(i)}) = 1 - e^{-(x_{(i)} - \theta )^2/(2\sigma ^2)}  \]

where $\theta =$ threshold parameter $\sigma =$ scale parameter $(\sigma >0)$

The parameter $\theta $ for the Rayleigh distribution must be less than the minimum data value. You can specify $\theta $ with the THETA= Rayleigh-option. The default value for $\theta $ is 0. You can specify $\sigma $ with the SIGMA= Rayleigh-option. By default, a maximum likelihood estimate is computed for $\sigma $.

If the data are Rayleigh distributed with parameters $\theta $ and $\sigma $, the points on the plot for THETA=$\theta $ and SIGMA=$\sigma $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified Rayleigh distribution is a good fit.

SIGMA=value

specifies the parameter $\sigma $, where $\sigma >0$. When used with the BETA, EXPONENTIAL, GAMMA, GUMBEL, NORMAL, PARETO, POWER, RAYLEIGH, and WEIBULL options, the SIGMA= option specifies the scale parameter. When used with the LOGNORMAL option, the SIGMA= option specifies the shape parameter. Enclose the SIGMA= option in parentheses after the distribution keyword. For an example of the SIGMA= option used with the NORMAL option, see Figure 5.31.

SQUARE

displays the P-P plot in a square frame. The default is a rectangular frame. See Figure 5.31 for an example.

SYMBOL='character'

Line Printerspecifies the character used for the diagonal reference line in line printer plots. The default character is the first letter of the distribution option keyword.

THETA=value
THRESHOLD=value

specifies the lower threshold parameter $\theta $ for plots requested with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, PARETO, POWER, RAYLEIGH, and WEIBULL options.

WEIBULL<(Weibull-options)>
WEIB<(Weibull-options)>

creates a Weibull P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to largest:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

The y-coordinate of the ith point is the empirical cdf value $\frac{i}{n}$. The x-coordinate is the theoretical Weibull cdf value

$ F(x_{(i)}) = 1-\exp \left( -\left( \frac{x_{(i)}-\theta }{\sigma } \right)^{c} \right) $ where $\theta = $ threshold parameter $\sigma = $ scale parameter $(\sigma >0)$ c = shape parameter (c > 0)

You can specify c, $\sigma $, and $\theta $ with the C=, SIGMA=, and THETA= Weibull-options, as illustrated in the following example:

proc capability data=measures;
   ppplot width / weibull(theta=1 sigma=2);
run;

If you do not specify values for these parameters, then by default $\theta =0$ and maximum likelihood estimates are calculated for $\sigma $ and c.

IMPORTANT: Your data must be greater than or equal to the lower threshold $\theta $. If the default $\theta =0$ is not an adequate lower bound for your data, you should specify $\theta $ with the THETA= option.

If the data are Weibull distributed with parameters c, $\sigma $, and $\theta $, the points on the plot for C=c, SIGMA=$\sigma $, and THETA=$\theta $ tend to fall on or near the diagonal line $y=x$, which is displayed by default. Agreement between the diagonal line and the point pattern is evidence that the specified Weibull distribution is a good fit. You can specify the SHAPE= option as an alias for the C= option, the SCALE= option as an alias for the SIGMA= option, and the THRESHOLD= option as an alias for the THETA= option.

ZETA=value

specifies a value for the scale parameter $\zeta $ for lognormal P-P plots requested with the LOGNORMAL option.