The UNIVARIATE Procedure

PROBPLOT Statement

  • PROBPLOT <variables> < / options>;

The PROBPLOT statement creates a probability plot, which compares ordered variable values with the percentiles of a specified theoretical distribution. If the data distribution matches the theoretical distribution, the points on the plot form a linear pattern. Consequently, you can use a probability plot to determine how well a theoretical distribution models a set of measurements.

Probability plots are similar to Q-Q plots, which you can create with the QQPLOT statement. Probability plots are preferable for graphical estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters.

You can use any number of PROBPLOT statements in the UNIVARIATE procedure. The components of the PROBPLOT statement are as follows.

variables

are the variables for which probability plots are created. If you specify a VAR statement, the variables must also be listed in the VAR statement. Otherwise, the variables can be any numeric variables in the input data set. If you do not specify a list of variables, then by default the procedure creates a probability plot for each variable listed in the VAR statement, or for each numeric variable in the DATA= data set if you do not specify a VAR statement. For example, each of the following PROBPLOT statements produces two probability plots, one for Length and one for Width:

proc univariate data=Measures;
   var Length Width;
   probplot;

proc univariate data=Measures;
   probplot Length Width;
run;
options

specify the theoretical distribution for the plot or add features to the plot. If you specify more than one variable, the options apply equally to each variable. Specify all options after the slash (/) in the PROBPLOT statement. You can specify only one option that names a distribution in each PROBPLOT statement, but you can specify any number of other options. The distributions available are the beta, exponential, gamma, generalized Pareto, Gumbel, lognormal, normal, Rayleigh, two-parameter Weibull, and three-parameter Weibull. By default, the procedure produces a plot for the normal distribution.

In the following example, the NORMAL option requests a normal probability plot for each variable, while the MU= and SIGMA= normal-options request a distribution reference line corresponding to the normal distribution with and . The SQUARE option displays the plot in a square frame, and the CTEXT= option specifies the text color.

proc univariate data=Measures;
   probplot Length1 Length2 / normal(mu=10 sigma=0.3)
                              square ctext=blue;
run;

Table 4.18 through Table 4.20 list the PROBPLOT options by function. For complete descriptions, see the sections Dictionary of Options and Dictionary of Common Options. Options can be any of the following:

  • primary options

  • secondary options

  • general options

Distribution Options

Table 4.18 lists options for requesting a theoretical distribution.

Table 4.18: Primary Options for Theoretical Distributions

Option

Description

BETA(beta-options)

specifies beta probability plot for shape parameters and specified with mandatory ALPHA= and BETA= beta-options

EXPONENTIAL(exponential-options)

specifies exponential probability plot

GAMMA(gamma-options)

specifies gamma probability plot for shape parameter specified with mandatory ALPHA= gamma-option

GUMBEL(Gumbel-options)

specifies Gumbel probability plot

LOGNORMAL(lognormal-options)

specifies lognormal probability plot for shape parameter specified with mandatory SIGMA= lognormal-option

NORMAL(normal-options)

specifies normal probability plot

PARETO(Pareto-options)

specifies generalized Pareto probability plot for shape parameter specified with mandatory ALPHA= Pareto-option

POWER(power-options)

specifies power function probability plot for shape parameter specified with mandatory ALPHA= power-option

RAYLEIGH(Rayleigh-options)

specifies Rayleigh probability plot

WEIBULL(Weibull-options)

specifies three-parameter Weibull probability plot for shape parameter c specified with mandatory C= Weibull-option

WEIBULL2(Weibull2-options)

specifies two-parameter Weibull probability plot


Table 4.19 lists secondary options that specify distribution parameters and control the display of a distribution reference line. Specify these options in parentheses after the distribution keyword. For example, you can request a normal probability plot with a distribution reference line by specifying the NORMAL option as follows:

proc univariate;
   probplot Length / normal(mu=10 sigma=0.3 color=red);
run;

The MU= and SIGMA= normal-options display a distribution reference line that corresponds to the normal distribution with mean and standard deviation , and the COLOR= normal-option specifies the color for the line.

Table 4.19: Secondary Distribution Options

Option

Description

Options Used with All Distributions

COLOR=

specifies color of distribution reference line

L=

specifies line type of distribution reference line

W=

specifies width of distribution reference line

Beta-Options

ALPHA=

specifies mandatory shape parameter

BETA=

specifies mandatory shape parameter

SIGMA=

specifies for distribution reference line

THETA=

specifies for distribution reference line

Exponential-Options

SIGMA=

specifies for distribution reference line

THETA=

specifies for distribution reference line

Gamma-Options

ALPHA=

specifies mandatory shape parameter

ALPHADELTA=

specifies change in successive estimates of at which the Newton-Raphson approximation of terminates

ALPHAINITIAL=

specifies initial value for in the Newton-Raphson approximation of

MAXITER=

specifies maximum number of iterations in the Newton-Raphson approximation of

SIGMA=

specifies for distribution reference line

THETA=

specifies for distribution reference line

Gumbel-Options

MU=

specifies for distribution reference line

SIGMA=

specifies for distribution reference line

Lognormal-Options

SIGMA=

specifies mandatory shape parameter

SLOPE=

specifies slope of distribution reference line

THETA=

specifies for distribution reference line

ZETA=

specifies for distribution reference line (slope is )

Normal-Options

MU=

specifies for distribution reference line

SIGMA=

specifies for distribution reference line

Pareto-Options

ALPHA=

specifies mandatory shape parameter

SIGMA=

specifies for distribution reference line

THETA=

specifies for distribution reference line

Power-Options

ALPHA=

specifies mandatory shape parameter

SIGMA=

specifies for distribution reference line

THETA=

specifies for distribution reference line

Rayleigh-Options

SIGMA=

specifies for distribution reference line

THETA=

specifies for distribution reference line

Weibull-Options

C=

specifies mandatory shape parameter c

ITPRINT

requests table of iteration history and optimizer details

MAXITER=

specifies maximum number of iterations in the Newton-Raphson approximation of

SIGMA=

specifies for distribution reference line

THETA=

specifies for distribution reference line

Weibull2-Options

C=

specifies for distribution reference line (slope is )

ITPRINT

requests table of iteration history and optimizer details

MAXITER=

specifies maximum number of iterations in the Newton-Raphson approximation of

SIGMA=

specifies for distribution reference line (intercept is )

SLOPE=

specifies slope of distribution reference line

THETA=

specifies known lower threshold


General Graphics Options

Table 4.20 summarizes the general options for enhancing probability plots.

Table 4.20: General Graphics Options

Option

Description

General Graphics Options

GRID

creates a grid

HREF=

specifies reference lines perpendicular to the horizontal axis

HREFLABELS=

specifies labels for HREF= lines

HREFLABPOS=

specifies position for HREF= line labels

NOHLABEL

suppresses label for horizontal axis

NOVLABEL

suppresses label for vertical axis

NOVTICK

suppresses tick marks and tick mark labels for vertical axis

PCTLORDER=

specifies tick mark labels for percentile axis

ROTATE

switches horizontal and vertical axes

SQUARE

displays plot in square format

VREF=

specifies reference lines perpendicular to the vertical axis

VREFLABELS=

specifies labels for VREF= lines

VREFLABPOS=

specifies horizontal position of labels for VREF= lines

VAXISLABEL=

specifies label for vertical axis

Options for Traditional Graphics Output

ANNOTATE=

specifies annotate data set

CAXIS=

specifies color for axis

CFRAME=

specifies color for frame

CGRID=

specifies color for grid lines

CHREF=

specifies colors for HREF= lines

CSTATREF=

specifies colors for STATREF= lines

CTEXT=

specifies color for text

CVREF=

specifies colors for VREF= lines

DESCRIPTION=

specifies description for plot in graphics catalog

FONT=

specifies software font for text

HAXIS=

specifies AXIS statement for horizontal axis

HEIGHT=

specifies height of text used outside framed areas

HMINOR=

specifies number of horizontal minor tick marks

INFONT=

specifies software font for text inside framed areas

INHEIGHT=

specifies height of text inside framed areas

LGRID=

specifies a line type for grid lines

LHREF=

specifies line types for HREF= lines

LSTATREF=

specifies line types for STATREF= lines

LVREF=

specifies line types for VREF= lines

NAME=

specifies name for plot in graphics catalog

NOFRAME

suppresses frame around plotting area

PCTLMINOR

requests minor tick marks for percentile axis

WAXIS=

specifies line thickness for axes and frame

WGRID=

specifies line thickness for grid

TURNVLABELS

turns and vertically strings out characters in labels for vertical axis

VAXIS=

specifies AXIS statement for vertical axis

VMINOR=

specifies number of vertical minor tick marks

Options for ODS Graphics Output

NOLINELEGEND

suppresses legend for distribution reference line

ODSFOOTNOTE=

specifies footnote displayed on plot

ODSFOOTNOTE2=

specifies secondary footnote displayed on plot

ODSTITLE=

specifies title displayed on plot

ODSTITLE2=

specifies secondary title displayed on plot

OVERLAY

overlays plots for different class levels (ODS Graphics only)

Options for Comparative Plots

ANNOKEY

applies annotation requested in ANNOTATE= data set to key cell only

CFRAMESIDE=

specifies color for filling frame for row labels

CFRAMETOP=

specifies color for filling frame for column labels

CPROP=

specifies color for proportion of frequency bar

CTEXTSIDE=

specifies color for row labels

CTEXTTOP=

specifies color for column labels

INTERTILE=

specifies distance between tiles

NCOLS=

specifies number of columns in comparative probability plot

NROWS=

specifies number of rows in comparative probability plot

Miscellaneous Options

CONTENTS=

specifies table of contents entry for probability plot grouping

NADJ=

adjusts sample size when computing percentiles

RANKADJ=

adjusts ranks when computing percentiles


Dictionary of Options

The following entries provide detailed descriptions of options in the PROBPLOT statement. Options marked with † are applicable only when traditional graphics are produced. See the section Dictionary of Common Options for detailed descriptions of options common to all plot statements.

ALPHA=value-list | EST

specifies the mandatory shape parameter for probability plots requested with the BETA, GAMMA, PARETO, and POWER options. Enclose the ALPHA= option in parentheses after the distribution keyword. If you specify ALPHA=EST, a maximum likelihood estimate is computed for .

BETA(ALPHA=value | EST  BETA=value | EST <beta-options>)

creates a beta probability plot for each combination of the required shape parameters and specified by the required ALPHA= and BETA= beta-options. If you specify ALPHA=EST and BETA=EST, the procedure creates a plot based on maximum likelihood estimates for and . You can specify the SCALE= beta-option as an alias for the SIGMA= beta-option and the THRESHOLD= beta-option as an alias for the THETA= beta-option. To create a plot that is based on maximum likelihood estimates for and , specify ALPHA=EST and BETA=EST.

To obtain graphical estimates of and , specify lists of values in the ALPHA= and BETA= beta-options, and select the combination of and that most nearly linearizes the point pattern. To assess the point pattern, you can add a diagonal distribution reference line corresponding to lower threshold parameter and scale parameter with the THETA= and SIGMA= beta-options. Alternatively, you can add a line that corresponds to estimated values of and with the beta-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the beta distribution with parameters , , , and is a good fit.

BETA=value-list | EST
B=value-list | EST

specifies the mandatory shape parameter for probability plots requested with the BETA option. Enclose the BETA= option in parentheses after the BETA option. If you specify BETA=EST, a maximum likelihood estimate is computed for .

C=value-list | EST

specifies the shape parameter c for probability plots requested with the WEIBULL and WEIBULL2 options. Enclose this option in parentheses after the WEIBULL or WEIBULL2 option. C= is a required Weibull-option in the WEIBULL option; in this situation, it accepts a list of values, or if you specify C=EST, a maximum likelihood estimate is computed for c. You can optionally specify C=value or C=EST as a Weibull2-option with the WEIBULL2 option to request a distribution reference line; in this situation, you must also specify Weibull2-option SIGMA=value or SIGMA=EST.

† CGRID=color

specifies the color for grid lines when a grid displays on the plot. This option also produces a grid.

EXPONENTIAL<(exponential-options)>
EXP<(exponential-options)>

creates an exponential probability plot. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and SIGMA= exponential-options. Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the exponential-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the exponential distribution with parameters and is a good fit. You can specify the SCALE= exponential-option as an alias for the SIGMA= exponential-option and the THRESHOLD= exponential-option as an alias for the THETA= exponential-option.

GAMMA(ALPHA=value | EST <gamma-options>)

creates a gamma probability plot for each value of the shape parameter given by the mandatory ALPHA= gamma-option. If you specify ALPHA=EST, the procedure creates a plot based on a maximum likelihood estimate for . To obtain a graphical estimate of , specify a list of values for the ALPHA= gamma-option and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and SIGMA= gamma-options. Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the gamma-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the gamma distribution with parameters , , and is a good fit. You can specify the SCALE= gamma-option as an alias for the SIGMA= gamma-option and the THRESHOLD= gamma-option as an alias for the THETA= gamma-option.

GRID

displays a grid. Grid lines are reference lines that are perpendicular to the percentile axis at major tick marks.

GUMBEL<(Gumbel-options)>

creates a Gumbel probability plot. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the MU= and SIGMA= Gumbel-options. Alternatively, you can add a line corresponding to estimated values of the location parameter and the scale parameter with the Gumbel-options MU=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the exponential distribution with parameters and is a good fit.

† LGRID=linetype

specifies the line type for the grid requested by the GRID= option. By default, LGRID=1, which produces a solid line.

LOGNORMAL(SIGMA=value | EST <lognormal-options>)
LNORM(SIGMA=value | EST <lognormal-options>)

creates a lognormal probability plot for each value of the shape parameter given by the mandatory SIGMA= lognormal-option. If you specify SIGMA=EST, the procedure creates a plot based on a maximum likelihood estimate for . To obtain a graphical estimate of , specify a list of values for the SIGMA= lognormal-option and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and ZETA= lognormal-options. Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the lognormal-options THETA=EST and ZETA=EST. Agreement between the reference line and the point pattern indicates that the lognormal distribution with parameters , , and is a good fit. You can specify the THRESHOLD= lognormal-option as an alias for the THETA= lognormal-option and the SCALE= lognormal-option as an alias for the ZETA= lognormal-option. See Example 4.26.

MU=value | EST

specifies the mean for a probability plot requested with the GUMBEL and NORMAL options. Enclose MU= in parentheses after the distribution keyword. You can specify MU=EST to request a distribution reference line with equal to the sample mean with the normal distribution. If you specify MU=EST for the Gumbel distribution, the procedure computes a maximum likelihood estimate.

NADJ=value

specifies the adjustment value added to the sample size in the calculation of theoretical percentiles. By default, NADJ=. Refer to Chambers et al. (1983).

NOLINELEGEND
NOLEGEND

suppresses the legend for the optional distribution reference line. The NOLINELEGEND option applies only to ODS Graphics output.

NORMAL<(normal-options)>

creates a normal probability plot. This is the default if you omit a distribution option. To assess the point pattern, you can add a diagonal distribution reference line corresponding to and with the MU= and SIGMA= normal-options. Alternatively, you can add a line corresponding to estimated values of and with the normal-options MU=EST and SIGMA=EST; the estimates of the mean and the standard deviation are the sample mean and sample standard deviation. Agreement between the reference line and the point pattern indicates that the normal distribution with parameters and is a good fit.

PARETO(ALPHA=value | EST <Pareto-options>)

creates a generalized Pareto probability plot for each value of the shape parameter given by the mandatory ALPHA= Pareto-option. If you specify ALPHA=EST, the procedure creates a plot based on a maximum likelihood estimate for . To obtain a graphical estimate of , specify a list of values for the ALPHA= Pareto-option and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and SIGMA= Pareto-options. Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the Pareto-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the generalized Pareto distribution with parameters , , and is a good fit.

† PCTLMINOR

requests minor tick marks for the percentile axis. The HMINOR option overrides the minor tick marks requested by the PCTLMINOR option.

PCTLORDER=values

specifies the tick marks that are labeled on the theoretical percentile axis. Because the values are percentiles, the labels must be between 0 and 100, exclusive. The values must be listed in increasing order and must cover the plotted percentile range. Otherwise, the default values of 1, 5, 10, 25, 50, 75, 90, 95, and 99 are used.

POWER(ALPHA=value | EST <power-options>)

creates a power function probability plot for each value of the shape parameter given by the mandatory ALPHA= power-option. If you specify ALPHA=EST, the procedure creates a plot based on a maximum likelihood estimate for . To obtain a graphical estimate of , specify a list of values for the ALPHA= power-option and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and SIGMA= power-options. Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the power-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the power function distribution with parameters , , and is a good fit.

RANKADJ=value

specifies the adjustment value added to the ranks in the calculation of theoretical percentiles. By default, RANKADJ=, as recommended by Blom (1958). Refer to Chambers et al. (1983) for additional information.

RAYLEIGH<(Rayleigh-options)>

creates an Rayleigh probability plot. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and SIGMA= Rayleigh-options. Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the Rayleigh-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the exponential distribution with parameters and is a good fit.

ROTATE

switches the horizontal and vertical axes so that the theoretical percentiles are plotted vertically while the data are plotted horizontally. Regardless of whether the plot has been rotated, horizontal axis options (such as HAXIS=) still refer to the horizontal axis, and vertical axis options (such as VAXIS=) still refer to the vertical axis. All other options that depend on axis placement adjust to the rotated axes.

SIGMA=value-list | EST

specifies the parameter , where . Alternatively, you can specify SIGMA=EST to request a maximum likelihood estimate for . The interpretation and use of the SIGMA= option depend on the distribution option with which it is used. See Table 4.21 for a summary of how to use the SIGMA= option. You must enclose this option in parentheses after the distribution option.

Table 4.21: Uses of the SIGMA= Option

Distribution Option

Use of the SIGMA= Option

BETA EXPONENTIAL GAMMA PARETO POWER RAYLEIGH WEIBULL

THETA= and SIGMA= request a distribution reference line corresponding to and .

GUMBEL

MU= and SIGMA= request a distribution reference line corresponding to and .

LOGNORMAL

SIGMA= requests n probability plots with shape parameters . The SIGMA= option must be specified.

NORMAL

MU= and SIGMA= request a distribution reference line corresponding to and . SIGMA=EST requests a line with equal to the sample standard deviation.

WEIBULL2

SIGMA= and C= request a distribution reference line corresponding to and .


SLOPE=value | EST

specifies the slope for a distribution reference line requested with the LOGNORMAL and WEIBULL2 options. Enclose the SLOPE= option in parentheses after the distribution option. When you use the SLOPE= lognormal-option with the LOGNORMAL option, you must also specify a threshold parameter value with the THETA= lognormal-option to request the line. The SLOPE= lognormal-option is an alternative to the ZETA= lognormal-option for specifying , because the slope is equal to .

When you use the SLOPE= Weibull2-option with the WEIBULL2 option, you must also specify a scale parameter value with the SIGMA= Weibull2-option to request the line. The SLOPE= Weibull2-option is an alternative to the C= Weibull2-option for specifying , because the slope is equal to .

For example, the first and second PROBPLOT statements produce the same probability plots and the third and fourth PROBPLOT statements produce the same probability plots:

proc univariate data=Measures;
   probplot Width / lognormal(sigma=2 theta=0 zeta=0);
   probplot Width / lognormal(sigma=2 theta=0 slope=1);
   probplot Width / weibull2(sigma=2 theta=0 c=.25);
   probplot Width / weibull2(sigma=2 theta=0 slope=4);
run;

SQUARE

displays the probability plot in a square frame. By default, the plot is in a rectangular frame.

THETA=value | EST
THRESHOLD=value | EST

specifies the lower threshold parameter for plots requested with the BETA, EXPONENTIAL, GAMMA, PARETO, POWER, RAYLEIGH, LOGNORMAL, WEIBULL, and WEIBULL2 options. Enclose the THETA= option in parentheses after a distribution option. When used with the WEIBULL2 option, the THETA= option specifies the known lower threshold , for which the default is 0. When used with the other distribution options, the THETA= option specifies for a distribution reference line; alternatively in this situation, you can specify THETA=EST to request a maximum likelihood estimate for . To request the line, you must also specify a scale parameter.

WEIBULL(C=value | EST <Weibull-options>)
WEIB(C=value | EST <Weibull-options>)

creates a three-parameter Weibull probability plot for each value of the required shape parameter c specified by the mandatory C= Weibull-option. To create a plot that is based on a maximum likelihood estimate for c, specify C=EST. To obtain a graphical estimate of c, specify a list of values in the C= Weibull-option and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and SIGMA= Weibull-options. Alternatively, you can add a line corresponding to estimated values of and with the Weibull-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the Weibull distribution with parameters c, , and is a good fit. You can specify the SCALE= Weibull-option as an alias for the SIGMA= Weibull-option and the THRESHOLD= Weibull-option as an alias for the THETA= Weibull-option.

WEIBULL2<(Weibull2-options)>
W2<(Weibull2-options)>

creates a two-parameter Weibull probability plot. You should use the WEIBULL2 option when your data have a known lower threshold , which is 0 by default. To specify the threshold value , use the THETA= Weibull2-option. By default, THETA=0. An advantage of the two-parameter Weibull plot over the three-parameter Weibull plot is that the parameters c and can be estimated from the slope and intercept of the point pattern. A disadvantage is that the two-parameter Weibull distribution applies only in situations where the threshold parameter is known. To obtain a graphical estimate of , specify a list of values for the THETA= Weibull2-option and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the SIGMA= and C= Weibull2-options. Alternatively, you can add a distribution reference line corresponding to estimated values of and with the Weibull2-options SIGMA=EST and C=EST. Agreement between the reference line and the point pattern indicates that the Weibull distribution with parameters , , and is a good fit. You can specify the SCALE= Weibull2-option as an alias for the SIGMA= Weibull2-option and the SHAPE= Weibull2-option as an alias for the C= Weibull2-option.

† WGRID=n

specifies the line thickness for the grid when producing traditional graphics. The option does not apply to ODS Graphics output.

ZETA=value | EST

specifies a value for the scale parameter for the lognormal probability plots requested with the LOGNORMAL option. Enclose the ZETA= lognormal-option in parentheses after the LOGNORMAL option. To request a distribution reference line with intercept and slope , specify the THETA= and ZETA=.