HISTOGRAM Statement: CAPABILITY Procedure

Dictionary of Options

The following entries provide detailed descriptions of options specific to the HISTOGRAM statement. The notes Traditional Graphics, ODS Graphics, and Line Printer identify options that apply to traditional graphics, ODS Graphics output, and line printers plots, respectively. See Dictionary of Common Options: CAPABILITY Procedure for detailed descriptions of options common to all the plot statements.

ALPHA=value-list

specifies the shape parameter $\alpha $ for fitted curves requested with the BETA, GAMMA, PARETO, and POWER options. Enclose the ALPHA= option in parentheses after the distribution keyword. If you do not specify a value for $\alpha $, the procedure calculates a maximum likelihood estimate. See Example 5.8. You can specify A= as an alias for ALPHA= if you use it as a beta-option. You can specify SHAPE= as an alias for ALPHA= if you use it as a gamma-option.

BARLABEL=COUNT | PERCENT | PROPORTION

Traditional Graphicsdisplays labels above the histogram bars. If you specify BARLABEL=COUNT, the label shows the number of observations associated with a given bar. BARLABEL=PERCENT shows the percent of observations represented by that bar. If you specify BARLABEL=PROPORTION, the label displays the proportion of observations associated with the bar.

BARWIDTH=value

Traditional Graphicsspecifies the width of the histogram bars in screen percent units.

BETA<(beta-options )>

displays a fitted beta density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} \frac{(x-\theta )^{\alpha -1}(\sigma +\theta -x)^{\beta -1}}{ B(\alpha ,\beta )\sigma ^{(\alpha +\beta -1)}} hv &  \mbox{for $\theta < x < \theta + \sigma $} \\ 0 &  \mbox{for $x \leq \theta $ or $x \geq \theta + \sigma $ } \end{array} \right.  \]

where $B(\alpha ,\beta )=\frac{\Gamma (\alpha )\Gamma (\beta )}{\Gamma (\alpha +\beta )}$ and

$\theta =$ lower threshold parameter (lower endpoint parameter) $\sigma =$ scale parameter $(\sigma >0)$ $\alpha =$ shape parameter $(\alpha >0)$ $\beta =$ shape parameter $(\beta >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

The beta distribution is bounded below by the parameter $\theta $ and above by the value $\theta + \sigma $. You can specify $\theta $ and $\sigma $ by using the THETA= and SIGMA= beta-options. The following statements fit a beta distribution bounded between 50 and 75 by using maximum likelihood estimates for $\alpha $ and $\beta $:

proc capability;
   histogram length / beta(theta=50 sigma=25);
run;

In general, the default values for THETA= and SIGMA= are 0 and 1, respectively. You can specify THETA=EST and SIGMA=EST to request maximum likelihood estimates for $\theta $ and $\sigma $.

The beta distribution has two shape parameters, $\alpha $ and $\beta $. If these parameters are known, you can specify their values with the ALPHA= and BETA= beta-options. If you do not specify values, the procedure calculates maximum likelihood estimates for $\alpha $ and $\beta $.

The BETA option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the BETA option. See Example 5.8. Also see Formulas for Fitted Curves.

BETA=value-list
B=value-list

specifies the second shape parameter $\beta $ for beta density curves requested with the BETA option. Enclose the BETA= option in parentheses after the BETA option. If you do not specify a value for $\beta $, the procedure calculates a maximum likelihood estimate. See Example 5.8.

BMCFILL=color

Traditional Graphicsspecifies the fill color for a box-and-whisker plot in a bottom margin requested with the BMPLOT= option. By default, the box-and-whisker plot is not filled.

BMCFRAME=color

Traditional Graphicsspecifies the color for filling the frame of a bottom margin plot requested with the BMPLOT= option. By default, this area is not filled.

BMCOLOR=color

Traditional Graphicsspecifies the color of a carpet plot, or the outline color of a box-and-whisker plot, in a bottom margin plot requested with the BMPLOT= option.

BMMARGIN=height

Traditional Graphicsspecifies the height in screen percentage units of a bottom margin plot requested with the BMPLOT= option. By default, a bottom margin plot occupies 15 percent of the vertical display space.

BMPLOT=CARPET | DOTPLOT | SKELETAL | SCHEMATIC

Traditional Graphicsproduces a carpet plot, dot plot, or box-and-whisker plot along the bottom margin of a histogram. A carpet plot or dot plot shows the distribution of individual observations along the histogram’s horizontal axis. A carpet plot represents each observation with a vertical line. A dot plot marks each observation with a symbol. A box-and-whisker plot gives a summary of the data distribution that a histogram alone does not provide. The left and right edges of the box are located at the first and third quartiles. A central vertical line is drawn at the median and a symbol is plotted inside the box at the mean. If you specify the SKELETAL keyword, a box-and-whisker plot is produced with whiskers extending to the minimum and maximum values. If you specify SCHEMATIC, a schematic box-and-whisker plot is produced. In a schematic box-and-whisker plot, the whiskers extend to the smallest value within the lower fence and the largest value within the upper fence. Fences are defined in terms of the interquartile range (IQR). The lower fence is 1.5 IQR below the first quartile and the upper fence is 1.5 IQR above the third quartile. Each observation outside the fences is plotted with a symbol.

C=value-list

specifies the shape parameter c for Weibull density curves requested with the WEIBULL option. Enclose the C= option in parentheses after the WEIBULL option. If you do not specify a value for c, the procedure calculates a maximum likelihood estimate. See Example 5.9. You can specify the SHAPE= option as an alias for the C= option.

C=value-list | MISE

specifies the standardized bandwidth parameter c for kernel density estimates requested with the KERNEL option. Enclose the C= option in parentheses after the KERNEL option. You can specify up to five values to request multiple estimates. You can also specify the C=MISE option, which produces the estimate with a bandwidth that minimizes the approximate mean integrated square error (MISE). For example, the following statements compute three density estimates:

proc capability;
   histogram length / kernel(c=0.5 1.0 mise);
run;

The first two estimates have standardized bandwidths of 0.5 and 1.0, respectively, and the third has a bandwidth that minimizes the approximate MISE.

You can also use the C= option with the K= option, which specifies the kernel function, to compute multiple estimates. If you specify more kernel functions than bandwidths, the last bandwidth in the list is repeated for the remaining estimates. Likewise, if you specify more bandwidths than kernel functions, the last kernel function is repeated for the remaining estimates. For example, the following statements compute three density estimates:

proc capability;
   histogram length / kernel(c=1 2 3 k=normal quadratic);
run;

The first uses a normal kernel and a bandwidth of 1, the second uses a quadratic kernel and a bandwidth of 2, and the third uses a quadratic kernel and a bandwidth of 3. See Example 5.12.

If you do not specify a value for c, the bandwidth that minimizes the approximate MISE is used for all the estimates.

CBARLINE=color

Traditional Graphicsspecifies the color of the outline of histogram bars. This option overrides the C= option in the SYMBOL1 statement.

CFILL=color

Traditional Graphicsspecifies a color used to fill the bars of the histogram (or the area under a fitted curve if you also specify the FILL option). See the entries for the FILL and PFILL= options for additional details. See Figure 5.11 and Output 5.8.1. Refer to SAS/GRAPH: Reference for a list of colors. By default, bars are filled with an appropriate color from the ODS style.

CGRID=color

Traditional Graphicsspecifies the color for grid lines requested with the GRID option. By default, grid lines are the same color as the axes. If you use CGRID=, you do not need to specify the GRID option.

CLIPCURVES

scales the vertical axis without taking fitted curves into consideration. Curves that extend above the tallest histogram bar may be clipped. You can use this option to avoid compression of the histogram bars due to extremely high fitted curve peaks.

CLIPREF

Traditional Graphicsdraws reference lines requested with the HREF= and VREF= options behind the histogram bars. By default, reference lines are drawn in front of the histogram bars.

CLIPSPEC=CLIP | NOFILL

Traditional Graphicsspecifies that histogram bars are clipped at the upper and lower specification limit lines when there are no observations outside the specification limits. The bar intersecting the lower specification limit is clipped if there are no observations less than the lower limit; the bar intersecting the upper specification limit is clipped if there are no observations greater than the upper limit. If you specify CLIPSPEC=CLIP, the histogram bar is truncated at the specification limit. If you specify CLIPSPEC=NOFILL, the portion of a filled histogram bar outside the specification limit is left unfilled. Specifying CLIPSPEC=NOFILL when histogram bars are not filled has no effect.

CURVELEGEND=name | NONE

specifies the name of a LEGEND statement describing the legend for specification limits and fitted curves. Specifying CURVELEGEND=NONE suppresses the legend for fitted curves; this is equivalent to specifying the NOCURVELEGEND option.

DELTA=value-list

specifies the first shape parameter $\delta $ for Johnson $S_ B$ and Johnson $S_ U$ density curves requested with the SB and SU options. Enclose the DELTA= option in parentheses after the SB or SU option. If you do not specify a value for $\delta $, the procedure calculates an estimate.

EDFNSAMPLES=value

specifies the number of simulation samples used to compute p-values for EDF goodness-of-fit statistics for density curves requested with the GUMBEL, IGAUSS, PARETO, and RAYLEIGH options. Enclose the EDFNSAMPLES= option in parentheses after the distribution option. The default value is 500.

EDFSEED=value

specifies an integer value used to start the pseudo-random number generator when creating simulation samples for computing EDF goodness-of-fit statistic p-values for density curves requested with the GUMBEL, IGAUSS, PARETO, and RAYLEIGH options. Enclose the EDFSEED= option in parentheses after the distribution option. By default, the procedure uses a random number seed generated from reading the time of day from the computer’s clock.

ENDPOINTS
ENDPOINTS=value-list

specifies that histogram interval endpoints, rather than midpoints, are aligned with horizontal axis tick marks. If you specify ENDPOINTS, the number of histogram intervals is based on the number of observations by using the method of Terrell and Scott (1985). If you specify ENDPOINTS=value-list, the values must be listed in increasing order and must be evenly spaced. All observations in the input data set, as well as any specification limits, must lie between the first and last values specified. The same value-list is used for all variables.

EXPONENTIAL<(exponential-options )>
EXP<(exponential-options )>

displays a fitted exponential density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} \frac{h v}{\sigma } \exp (-(\frac{x - \theta }{\sigma })) &  \mbox{for $x \geq \theta $} \\ 0 &  \mbox{for $x < \theta $} \end{array} \right.  \]

where $\theta =$ threshold parameter $\sigma =$ scale parameter $(\sigma >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

The parameter $\theta $ must be less than or equal to the minimum data value. You can specify $\theta $ with the THETA= exponential-option. The default value for $\theta $ is zero. If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta $. You can specify $\sigma $ with the SIGMA= exponential-option. By default, a maximum likelihood estimate is computed for $\sigma $. For example, the following statements fit an exponential curve with $\theta =10$ and with a maximum likelihood estimate for $\sigma $:

proc capability;
   histogram / exponential(theta=10 l=2 color=red);
run;

The curve is red and has a line type of 2. The EXPONENTIAL option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the EXPONENTIAL option. See Formulas for Fitted Curves.

FILL

Traditional GraphicsODS Graphicsfills areas under a parametric density curve or kernel density estimate with colors and patterns. Enclose the FILL option in parentheses after a curve option or the KERNEL option, as in the following statements:

proc capability;
   histogram length / normal(fill) cfill=green pfill=solid;
run;

Depending on the area to be filled (outside or between the specification limits), you can specify the color and pattern with options in the SPEC statement and HISTOGRAM statement, as summarized in the following table:

Area Under Curve

Statement

Option

between specification

HISTOGRAM

CFILL=

limits

HISTOGRAM

PFILL=

left of lower

SPEC

CLEFT=

specification limit

SPEC

PLEFT=

right of upper

SPEC

CRIGHT=

specification limit

SPEC

PRIGHT=

If you do not display specification limits, the CFILL= and PFILL= options specify the color and pattern for the entire area under the curve. Solid fills are used by default if patterns are not specified. You can specify the FILL option with only one fitted curve. For an example, see Output 5.8.1. Refer to SAS/GRAPH: Reference for a list of available patterns and colors. If you do not specify the FILL option but specify the options in the preceding table, the colors and patterns are applied to the corresponding areas under the histogram.

FRONTREF

Traditional Graphicsdraws reference lines requested with the HREF= and VREF= options in front of the histogram bars. When the NOGSTYLE system option is specified, reference lines are drawn behind the histogram bars by default, and can be obscured by them.

GAMMA<(gamma-options)>

displays a fitted gamma density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} \frac{h v}{\Gamma (\alpha )\sigma } (\frac{x - \theta }{\sigma })^{\alpha - 1} \exp (-(\frac{x - \theta }{\sigma })) &  \mbox{for $x > \theta $} \\ 0 &  \mbox{for $x \leq \theta $} \end{array} \right.  \]

where $\theta =$ threshold parameter $\sigma =$ scale parameter $(\sigma >0)$ $\alpha =$ shape parameter $(\alpha >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

The parameter $\theta $ for the gamma distribution must be less than the minimum data value. You can specify $\theta $ with the THETA= gamma-option. The default value for $\theta $ is 0. If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta $. In addition, the gamma distribution has a shape parameter $\alpha $ and a scale parameter $\sigma $. You can specify these parameters with the ALPHA= and SIGMA= gamma-options. By default, maximum likelihood estimates are computed for $\alpha $ and $\sigma $. For example, the following statements fit a gamma curve with $\theta =4$ and with maximum likelihood estimates for $\alpha $ and $\sigma $:

proc capability;
   histogram length / gamma(theta=4);
run;

Note that the maximum likelihood estimate of $\alpha $ is calculated iteratively using the Newton-Raphson approximation. The ALPHADELTA=, ALPHAINITIAL=, and MAXITER= gamma-options control the approximation.

The GAMMA option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the GAMMA option. See Example 5.9 and Formulas for Fitted Curves.

GAMMA=value-list

specifies the second shape parameter $\gamma $ for Johnson $S_ B$ and Johnson $S_ U$ density curves requested with the SB and SU options. Enclose the GAMMA= option in parentheses after the SB or SU option. If you do not specify a value for $\gamma $, the procedure calculates an estimate.

GRID

Traditional GraphicsODS Graphicsadds a grid to the histogram. Grid lines are horizontal lines positioned at major tick marks on the vertical axis.

GUMBEL<(Gumbel-options)>

displays a fitted Gumbel (also known as Type 1 extreme value distribution) density curve on the histogram. The curve equation is

\[  p(x) = \frac{h v}{\sigma }e^{-(x-\mu )/\sigma } \exp \left( -e^{-(x-\mu )/\sigma }\right)  \]

where $\mu =$ location parameter $\sigma =$ scale parameter $(\sigma >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

You can specify values for $\mu $ and $\sigma $ with the MU= and SIGMA= Gumbel-options. By default, maximum likelihood estimates are computed for $\mu $ and $\sigma $.

The GUMBEL option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the GUMBEL option. See Formulas for Fitted Curves.

HANGING
HANG

requests a hanging histogram , as illustrated in Figure 5.12.

Figure 5.12: Hanging Histogram

Hanging Histogram


You can use the HANGING option with only one fitted density curve. A hanging histogram aligns the tops of the histogram bars (displayed as lines) with the fitted curve. The lines are positioned at the midpoints of the histogram bins. A hanging histogram is a goodness-of-fit diagnostic in the sense that the closer the lines are to the horizontal axis, the better the fit. Hanging histograms are discussed by Tukey (1977), Wainer (1974), and Velleman and Hoaglin (1981).

HOFFSET=value

Traditional Graphicsspecifies the offset in percent screen units at both ends of the horizontal axis. Specify HOFFSET=0 to eliminate the default offset.

IGAUSS<(iGauss-options)>

displays a fitted inverse Gaussian density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} hv \left(\frac{\lambda }{2\pi x^3}\right)^{1/2} \exp (-\frac{\lambda }{2\mu ^2 x}(x-\mu )^2) &  \mbox{for $x > 0 $} \\ 0 &  \mbox{for $x \leq 0 $} \end{array} \right.  \]

where $\Phi (\cdot )$ is the standard normal cumulative distribution function, and $\mu =$ mean parameter $(\mu > 0)$ $\lambda =$ shape parameter $(\lambda >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

You can specify values for $\mu $ and $\lambda $ with the MU= and LAMBDA= iGauss-options. By default, the sample mean is used for $\mu $ and a maximum likelihood estimate is computed for $\lambda $.

The IGAUSS option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the IGAUSS option. See Formulas for Fitted Curves.

INDICES

requests capability indices based on the fitted distribution. Enclose the keyword INDICES in parentheses after the distribution keyword. See Indices Using Fitted Curves for computational details and see Output 5.11.2.

INTERBAR=value

Traditional Graphics specifies the horizontal space in percent screen units between histogram bars. By default, the bars are contiguous.

K=NORMAL | QUADRATIC | TRIANGULAR

specifies the kernel function (normal, quadratic, or triangular) used to compute a kernel density estimate. Enclose the K= option in parentheses after the KERNEL option, as in the following statements:

proc capability;
   histogram length / kernel(k=quadratic);
run;

You can specify kernel functions for up to five estimates. You can also use the K= option together with the C= option, which specifies standardized bandwidths. If you specify more kernel functions than bandwidths, the last bandwidth in the list is repeated for the remaining estimates. Likewise, if you specify more bandwidths than kernel functions, the last kernel function is repeated for the remaining estimates. For example, the following statements compute three estimates with bandwidths of 0.5, 1.0, and 1.5:

proc capability;
   histogram length / kernel(c=0.5 1.0 1.5 k=normal quadratic);
run;

The first estimate uses a normal kernel, and the last two estimates use a quadratic kernel. By default, a normal kernel is used.

KERNEL<( kernel-options )>

superimposes up to five kernel density estimates on the histogram. You can specify the kernel-options described in the following table:

Option

Description

C=

specifies the smoothing parameter

COLOR=

specifies the color of the curve

FILL

specifies that the area under the curve is to be filled

K=

specifies the type of kernel function

L=

specifies the line style for the curve

LOWER=

specifies the lower bound for the curve

SYMBOL=

specifies the character used for the kernel density curve in line printer plots

UPPER=

specifies the upper bound for the curve

W=

specifies the width of the curve

You can request multiple kernel density estimates on the same histogram by specifying a list of values for either the C= or K= option. For more information, see the entries for these options. Also see Output 5.6.1 and Kernel Density Estimates. By default, kernel density estimates are computed using the AMISE method.

LAMBDA=value

specifies the shape parameter $\lambda $ for fitted curves requested with the IGAUSS option. Enclose the LAMBDA= option in parentheses after the IGAUSS distribution keyword. If you do not specify a value for $\lambda $, the procedure calculates a maximum likelihood estimate.

LEGEND=name | NONE

Traditional Graphicsspecifies the name of a LEGEND statement describing the legend for specification limit reference lines and fitted curves. Specifying LEGEND=NONE suppresses all legend information and is equivalent to specifying the NOLEGEND option.

LGRID=n

Traditional Graphicsspecifies the line type for the grid requested with the GRID option. If you use the LGRID= option, you do not need to specify the GRID option. The default is 1, which produces a solid line.

LOGNORMAL<(lognormal-options)>

displays a fitted lognormal density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} \frac{h v}{\sigma \sqrt {2\pi }(x - \theta )} \exp \left(-\frac{(\log (x-\theta )-\zeta )^{2}}{2\sigma ^{2}}\right) &  \mbox{for $ x > \theta $} \\ 0 &  \mbox{for $ x \leq \theta $} \end{array} \right.  \]

where $\theta =$ threshold parameter $\zeta =$ scale parameter $\sigma =$ shape parameter $(\sigma >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

Note that the lognormal distribution is also referred to as the $S_ L$ distribution in the Johnson system of distributions.

The parameter $\theta $ for the lognormal distribution must be less than the minimum data value. You can specify $\theta $ with the THETA= lognormal-option. The default value for $\theta $ is zero. If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta $. You can specify the parameters $\sigma $ and $\zeta $ with the SIGMA= and ZETA= lognormal-options. By default, maximum likelihood estimates are computed for $\sigma $ and $\zeta $. For example, the following statements fit a lognormal distribution function with a default value of $\theta =0$ and with maximum likelihood estimates for $\sigma $ and $\zeta $:

proc capability;
   histogram length / lognormal;
run;

The LOGNORMAL option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options that you can specify with the LOGNORMAL option. See Example 5.9 and Formulas for Fitted Curves.

LOWER=value-list

specifies lower bounds for kernel density estimates requested with the KERNEL option. Enclose the LOWER= option in parentheses after the KERNEL option. You can specify up to five lower bounds for multiple kernel density estimates. If you specify more kernel estimates than lower bounds, the last lower bound is repeated for the remaining estimates.

MAXNBIN=n

specifies the maximum number of bins to be displayed in a comparative histogram. This option is useful in situations where the scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number of bins may be so great that each cell histogram is scaled into a narrow region. By limiting the number of bins with the MAXNBIN= option, you can narrow the window about the data distribution in the key cell. Note that the MAXNBIN= option provides an alternative to the MAXSIGMAS= option.

MAXSIGMAS=value

limits the number of bins to be displayed to a range of value standard deviations (of the data in the key cell) above and below the mean of the data in the key cell. This option is useful in situations where the scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. If the cell scales differ considerably, however, the resulting number of bins may be so great that each cell histogram is scaled into a narrow region. By limiting the number of bins with the MAXSIGMAS= option, you narrow the window about the data distribution in the key cell. Note that the MAXSIGMAS= option provides an alternative to the MAXNBIN= option.

MIDPERCENTS

requests a table listing the midpoints and percent of observations in each histogram interval. For example, the following statements create the table in Figure 5.13:

proc capability;
   histogram Length / midpercents;
run;

Figure 5.13: Table of Midpoints and Observed Percentages

The CAPABILITY Procedure

Histogram Bins for
Length
Bin
Midpoint
Observed
Percent
10.02 12.000
10.08 32.000
10.14 28.000
10.20 18.000
10.26 6.000
10.32 4.000


If you specify the MIDPERCENTS option in parentheses after a density estimate option, a table listing the midpoints, observed percent of observations, and the estimated percent of the population in each interval (estimated from the fitted distribution) is printed.

The following statements create the table shown in Figure 5.14:

proc capability;
   histogram Length / gamma(theta=3 midpercents);
run;

Figure 5.14: Table of Observed and Expected Percentages

The CAPABILITY Procedure
Fitted Gamma Distribution for Length (Attachment Point Offset in mm)

Histogram Bin Percents
for Gamma Distribution
Bin
Midpoint
Percent
Observed Estimated
10.02 12.000 11.480
10.08 32.000 26.182
10.14 28.000 31.354
10.20 18.000 19.916
10.26 6.000 6.766
10.32 4.000 1.238


MIDPOINTS=value-list | KEY | UNIFORM

specifies how to determine the midpoints for the histogram intervals, where values-list determines the width of the histogram bars as the difference between consecutive midpoints. The procedure uses the same values for all variables. See Output 5.9.1.

The range of midpoints, extended at each end by half of the bar width, must cover the range of the data as well as any specification limits. For example, if you specify

midpoints=2 to 10 by 0.5

then all of the observations and specification limits should fall between 1.75 and 10.25. (Otherwise, a default list of midpoints is used.) You must use evenly spaced midpoints listed in increasing order.

KEY

determines the midpoints for the data in the key cell. The initial number of midpoints is based on the number of observations in the key cell that use the method of Terrell and Scott (1985). The procedure extends the midpoint list for the key cell in either direction as necessary until it spans the data in the remaining cells.

UNIFORM

determines the midpoints by using all the observations as if there were no cells. In other words, the number of midpoints is based on the total sample size by using the method of Terrell and Scott (1985).

Neither KEY nor UNIFORM apply unless you use the CLASS statement. By default, if you use a CLASS statement, MIDPOINTS=KEY. However, if the key cell is empty then MIDPOINTS=UNIFORM. Otherwise, the procedure computes the midpoints by using the algorithm described in Terrell and Scott (1985). The default midpoints are primarily applicable to continuous data that are approximately normally distributed.

If you produce traditional graphics and use the MIDPOINTS= and HAXIS= options, you can use the ORDER= option in the AXIS statement you specified with the HAXIS= option. However, for the tick mark labels to coincide with the histogram interval midpoints, the range of the ORDER= list must encompass the range of the MIDPOINTS= list, as illustrated in the following statements:

proc capability;
   histogram length / midpoints=20 to 80 by 10
                      haxis=axis1;
   axis1 length=6 in order=10 20 30 40 50 60 70 80 90;
run;
MIDPTAXIS=name

Traditional Graphicsis an alias for the HAXIS= option described earlier in this section.

MU=value-list

specifies the parameter $\mu $ for fitted curves requested with the GUMBEL, IGAUSS, and NORMAL options. Enclose the MU= option in parentheses after the distribution keyword. For the normal and inverse Gaussian distributions, the default value of $\mu $ is the sample mean. If you do not specify a value for $\mu $ for the Gumbel distribution, the procedure calculates a maximum likelihood estimate.

NENDPOINTS=n

specifies the number of histogram interval endpoints and causes the endpoints, rather than interval midpoints, to be aligned with horizontal axis tick marks.

NMIDPOINTS=n

specifies the number of histogram intervals.

NOBARS

suppresses drawing of histogram bars. This option is useful when you want to display fitted curves only.

NOCURVELEGEND
NOCURVEL

suppresses the portion of the legend for fitted curves. If you use the INSET statement to display information about the fitted curve on the histogram, you can use the NOCURVELEGEND option to prevent the information about the fitted curve from being repeated in a legend at the bottom of the histogram. See Output 5.15.1.

NOLEGEND

suppresses legends for specification limits, fitted curves, distribution lines, and hidden observations. See Example 5.13. Specifying the NOLEGEND option is equivalent to specifying LEGEND=NONE.

NOPLOT

suppresses the creation of a plot. Use the NOPLOT option when you want only to print summary statistics for a fitted density or create either an OUTFIT= or an OUTHISTOGRAM= data set. See Example 5.11.

NOPRINT

suppresses printed output summarizing the fitted curve. Enclose the NOPRINT option in parentheses following the distribution option. See Customizing a Histogram for an example.

NORMAL<(normal-options)>

displays a fitted normal density curve on the histogram. The curve equation is

\[  p(x) = \begin{array}{ll} \frac{h v}{\sigma \sqrt {2\pi }} \exp \left(-\frac{1}{2} (\frac{x - \mu }{\sigma })^{2}\right) &  \mbox{for $-\infty < x < \infty $} \end{array}  \]

where $\mu =$ mean $\sigma =$ standard deviation $(\sigma >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

Note that the normal distribution is also referred to as the $S_ N$ distribution in the Johnson system of distributions.

You can specify values for $\mu $ and $\sigma $ with the MU= and SIGMA= normal-options, as shown in the following statements:

proc capability;
   histogram length / normal(mu=14 sigma=0.05);
run;

By default, the sample mean and sample standard deviation are used for $\mu $ and $\sigma $. The NORMAL option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options that you can specify with the NORMAL option. See Figure 5.10 and Formulas for Fitted Curves.

NOSPECLEGEND
NOSPECL

suppresses the portion of the legend for specification limit reference lines. See Figure 5.11.

NOTABCONTENTS

suppresses the table of contents entries for tables produced by the HISTOGRAM statement. See the section ODS Tables for descriptions of the tables produced by the HISTOGRAM statement.

OUTFIT=SAS-data-set

creates a SAS data set that contains parameter estimates for fitted curves and related goodness-of-fit information. See Output Data Sets.

OUTHISTOGRAM=SAS-data-set
OUTHIST=SAS-data-set

creates a SAS data set that contains information about histogram intervals. Specifically, the data set contains the midpoints of the histogram intervals, the observed percent of observations in each interval, and the estimated percent of observations in each interval (estimated from each of the specified fitted curves). See Output Data Sets.

OUTKERNEL=SAS-data-set

creates a SAS data set containing information about kernel density estimates requested with the KERNEL option. See OUTKERNEL= Output Data Set for details.

PARETO<(Pareto-options)>

displays a fitted generalized Pareto density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} \frac{hv}{\sigma }(1 - \alpha (x-\theta )/\sigma )^{1/\alpha -1} &  \mbox{if $ \alpha \neq 0$} \\ \frac{hv}{\sigma } \exp (-(x-\theta )/\sigma ) &  \mbox{if $ \alpha = 0$} \end{array} \right.  \]

where $\theta =$ threshold parameter $\sigma =$ scale parameter $(\sigma >0)$ $\alpha =$ shape parameter $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

The parameter $\theta $ must be less than the minimum data value. You can specify $\theta $ with the THETA= Pareto-option. The default value for $\theta $ is zero. If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta $. In addition, the generalized Pareto distribution has a shape parameter $\alpha $ and a scale parameter $\sigma $. You can specify these parameters with the ALPHA= and SIGMA= Pareto-options. By default, maximum likelihood estimates are computed for $\alpha $ and $\sigma $.

The PARETO option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the PARETO option. See Formulas for Fitted Curves.

PCTAXIS=name|value-list

is an alias for the VAXIS= option.

PERCENTS=value-list
PERCENT=value-list

specifies a list of percents for which quantiles calculated from the data and quantiles estimated from the fitted curve are tabulated. The percents must be between 0 and 100. Enclose the PERCENTS= option in parentheses after the curve option. The default percents are 1, 5, 10, 25, 50, 75, 90, 95, and 99.

For example, the following statements create the table shown in Figure 5.15:

proc capability;
   histogram Length / lognormal (percents=1 3 5 95 97 99);
run;

Figure 5.15: Estimated and Observed Quantiles for the Lognormal Curve

The CAPABILITY Procedure
Fitted Lognormal Distribution for Length (Attachment Point Offset in mm)

Quantiles for Lognormal Distribution
Percent Quantile
Observed Estimated
1.0 10.0180 9.95696
3.0 10.0180 9.98937
5.0 10.0310 10.00658
95.0 10.2780 10.24963
97.0 10.2930 10.26729
99.0 10.3220 10.30071


PFILL=pattern

Traditional Graphicsspecifies a pattern used to fill the bars of the histograms (or the areas under a fitted curve if you also specify the FILL option). See the entries for the CFILL= and FILL options for additional details. Refer to SAS/GRAPH: Reference for a list of pattern values. By default, the bars and curve areas are not filled.

POWER<(power-options)>

displays a fitted power function density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} hv \frac{\alpha }{\sigma }\left(\frac{x-\theta }{\sigma }\right)^{\alpha -1} &  \mbox{for $\theta < x < \theta + \sigma $} \\ 0 &  \mbox{for $x \leq \theta $ or $x \geq \theta + \sigma $ } \end{array} \right.  \]

where $\theta =$ threshold parameter $\sigma =$ scale parameter $(\sigma >0)$ $\alpha =$ shape parameter $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

The parameter $\theta $ must be less than or equal to the minimum data value. You can specify $\theta $ and $\sigma $ with the THETA= and the SIGMA= power-options. The default values for $\theta $ and $\sigma $ are 0 and 1, respectively. You can specify THETA=EST and SIGMA=EST to request maximum likelihood estimates for $\theta $ and $\sigma $.

In addition, the generalized Pareto distribution has a shape parameter $\alpha $. You can specify $\alpha $ with the ALPHA= power-option. By default, a maximum likelihood estimate is computed for $\alpha $.

The POWER option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the POWER option. See Formulas for Fitted Curves.

RAYLEIGH<(Rayleigh-options)>

displays a fitted Rayleigh density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} hv \frac{x-\theta }{\sigma ^2}e^{-(x-\theta )^2/(2\sigma ^2)} &  \mbox{for $x \geq \theta $} \\ 0 &  \mbox{for $x <\theta $} \end{array} \right.  \]

where $\theta =$ threshold parameter $\sigma =$ scale parameter $(\sigma >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

The parameter $\theta $ must be less than or equal to the minimum data value. You can specify $\theta $ with the THETA= Rayleigh-option. The default value for $\theta $ is zero. If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta $. You can specify $\sigma $ with the SIGMA= Rayleigh-option. By default, a maximum likelihood estimate is computed for $\sigma $.

The RAYLEIGH option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the RAYLEIGH option. See Formulas for Fitted Curves.

RTINCLUDE

includes the right endpoint of each histogram interval in that interval. By default, the left endpoint is included in the histogram interval.

SB<( $S_{B}$ -options )>

displays a fitted Johnson $S_ B$ density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} \frac{\delta h v}{\sigma \sqrt {2\pi } } \left[ \left( \frac{x - \theta }{\sigma } \right) \left( 1 - \frac{x - \theta }{\sigma } \right) \right]^{-1} \times & \\ \exp \left[ -\frac{1}{2} \left( \gamma + \delta \log ( \frac{x - \theta }{\theta + \sigma -x} ) \right)^2 \right] &  \mbox{for $ \theta < x < \theta + \sigma $} \\ 0 &  \mbox{for $ x \leq \theta $ or $ x \geq \theta + \sigma $} \end{array} \right.  \]

where $\theta =$ threshold parameter $(-\infty < \theta < \infty )$ $\sigma =$ scale parameter $(\sigma > 0)$ $\delta =$ shape parameter $(\delta >0)$ $\gamma =$ shape parameter $(-\infty < \gamma < \infty )$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

The $S_ B$ distribution is bounded below by the parameter $\theta $ and above by the value $\theta + \sigma $. The parameter $\theta $ must be less than the minimum data value. You can specify $\theta $ with the THETA= $S_ B$-option, or you can request that $\theta $ be estimated with the THETA = EST $S_ B$-option. The default value for $\theta $ is zero. The sum $\theta + \sigma $ must be greater than the maximum data value. The default value for $\sigma $ is one. You can specify $\sigma $ with the SIGMA= $S_ B$-option, or you can request that $\sigma $ be estimated with the SIGMA = EST $S_ B$-option. You can specify $\delta $ with the DELTA= $S_ B$-option, and you can specify $\gamma $ with the GAMMA= $S_ B$-option. Note that the $S_ B$-options are given in parentheses after the SB option.

By default, the method of percentiles is used to estimate the parameters of the $S_ B$ distribution. Alternatively, you can request the method of moments or the method of maximum likelihood with the FITMETHOD = MOMENTS or FITMETHOD = MLE options, respectively. Consider the following example:

proc capability;
   histogram length / sb;
   histogram length / sb( theta=est sigma=est );
   histogram length / sb( theta=0.5 sigma=8.4 
                          delta=0.8 gamma=-0.6 );
run;

The first HISTOGRAM statement fits an $S_ B$ distribution with default values of $\theta =0$ and $\sigma =1$ and with percentile-based estimates for $\delta $ and $\gamma $. The second HISTOGRAM statement estimates all four parameters with the method of percentiles. The third HISTOGRAM statement displays an $S_ B$ curve with specified values for all four parameters.

The SB option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the SB option.

SIGMA=value-list

specifies the parameter $\sigma $ for fitted curves requested with the BETA, EXPONENTIAL, GAMMA, GUMBEL, LOGNORMAL, NORMAL, PARETO, POWER, RAYLEIGH, SB, SU, and WEIBULL options. Enclose the SIGMA= option in parentheses after the distribution keyword. The following table summarizes the use of the SIGMA= option.

Distribution Keyword

SIGMA= Specifies

Default Value

Alias

BETA

scale parameter $\sigma $

1

SCALE=

EXPONENTIAL

scale parameter $\sigma $

maximum likelihood estimate

SCALE=

GAMMA

scale parameter $\sigma $

maximum likelihood estimate

SCALE=

GUMBEL

scale parameter $\sigma $

maximum likelihood estimate

 

LOGNORMAL

shape parameter $\sigma $

maximum likelihood estimate

SHAPE=

NORMAL

scale parameter $\sigma $

standard deviation

 

PARETO

scale parameter $\sigma $

maximum likelihood estimate

 

POWER

scale parameter $\sigma $

1

SCALE=

RAYLEIGH

scale parameter $\sigma $

maximum likelihood estimate

 

SB

scale parameter $\sigma $

1

SCALE=

SU

scale parameter $\sigma $

percentile-based estimate

SCALE=

WEIBULL

scale parameter $\sigma $

maximum likelihood estimate

SCALE=

If you specify SIGMA=EST, an estimate is computed for $\sigma $. For syntax examples, see the entries for the distribution options.

SPECLEGEND=name | NONE

specifies the name of a LEGEND statement describing the legend for specification limits and fitted curves. Specifying SPECLEGEND=NONE, which suppresses the portion of the legend for specification limit references lines, is equivalent to specifying the NOSPECLEGEND option.

SU<( $S_{U}$ -options )>

displays a fitted Johnson $S_ U$ density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} \frac{ \delta h v}{\sigma \sqrt {2\pi } } \frac{ 1 }{ \sqrt { 1 + \left( (x - \theta ) / \sigma \right)^2 } } \times & \\ \exp \left[ -\frac{1}{2} \left( \gamma + \delta \sinh ^{-1} \left( \frac{x - \theta }{\sigma } \right) \right)^2 \right] &  \mbox{for $ x > \theta $} \\ 0 &  \mbox{for $ x \leq \theta $ } \end{array} \right.  \]

where

$\theta =$ location parameter $(-\infty < \theta < \infty )$ $\sigma =$ scale parameter $(\sigma > 0)$ $\delta =$ shape parameter $(\delta >0)$ $\gamma =$ shape parameter $(-\infty < \gamma < \infty )$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

You can specify the parameters with the THETA=, SIGMA=, DELTA=, and GAMMA= $S_ U$-options, which are enclosed in parentheses after the SU option. If you do not specify these parameters, they are estimated.

By default, the method of percentiles is used to estimate the parameters of the $S_ U$ distribution. Alternatively, you can request the method of moments or the method of maximum likelihood with the FITMETHOD = MOMENTS or FITMETHOD = MLE options, respectively. Consider the following example:

proc capability;
   histogram length / su;      
   histogram length / su( theta=0.5 sigma=8.4 
                          delta=0.8 gamma=-0.6 );
run;

The first HISTOGRAM statement estimates all four parameters with the method of percentiles. The second HISTOGRAM statement displays an $S_ U$ curve with specified values for all four parameters.

The SU option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options you can specify with the SU option.

SYMBOL='character'

Line Printerspecifies the character used for the density curve or kernel density curve in line printer plots. Enclose the SYMBOL= option in parentheses after the distribution option or the KERNEL option. The default character is the first letter of the distribution keyword or '1' for the first kernel density estimate, '2' for the second kernel density estimate, and so on. If you use the SYMBOL= option with the KERNEL option, you can specify a list of up to five characters in parentheses for multiple kernel density estimates. If there are more estimates than characters, the last character specified is used for the remaining estimates.

THETA=value-list
THRESHOLD=value-list

specifies the lower threshold parameter $\theta $ for curves requested with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, PARETO, POWER, RAYLEIGH, SB, and WEIBULL options, and the location parameter $\theta $ for curves requested with the SU option. Enclose the THETA= option in parentheses after the curve option. See Example 5.8. The default value is zero. If you specify THETA=EST, an estimate is computed for $\theta $.

UPPER=value-list

specifies upper bounds for kernel density estimates requested with the KERNEL option. Enclose the UPPER= option in parentheses after the KERNEL option. You can specify up to five upper bounds for multiple kernel density estimates. If you specify more kernel estimates than upper bounds, the last upper bound is repeated for the remaining estimates.

VOFFSET=value

Traditional Graphicsspecifies the offset in percent screen units at the upper end of the vertical axis.

VSCALE=COUNT | PERCENT | PROPORTION

specifies the scale of the vertical axis. The value COUNT scales the data in units of the number of observations per data unit. The value PERCENT scales the data in units of percent of observations per data unit. The value PROPORTION scales the data in units of proportion of observations per data unit. See Figure 5.11 for an illustration of VSCALE=COUNT. The default is PERCENT.

WBARLINE=n

Traditional Graphicsspecifies the width of bar outlines. By default, n = 1.

WEIBULL<(Weibull-options)>

displays a fitted Weibull density curve on the histogram. The curve equation is

\[  p(x) = \left\{  \begin{array}{ll} \frac{ch v}{\sigma } (\frac{x - \theta }{\sigma })^{c - 1} \exp (-(\frac{x- \theta }{\sigma })^ c) &  \mbox{for $ x > \theta $} \\ 0 &  \mbox{for $ x \leq \theta $} \end{array} \right.  \]

where $\theta =$ threshold parameter $\sigma =$ scale parameter $(\sigma >0)$ $c =$ shape parameter $(\mi {c} >0)$ $h =$ width of histogram interval $v =$ vertical scaling factor and

\[  v = \left\{  \begin{array}{ll} n &  \mbox{the sample size, for VSCALE=COUNT} \\ 100 &  \mbox{for VSCALE=PERCENT} \\ 1 &  \mbox{for VSCALE=PROPORTION} \end{array} \right.  \]

The parameter $\theta $ must be less than the minimum data value. You can specify $\theta $ with the THETA= Weibull-option. The default value for $\theta $ is zero. If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta $. You can specify $\sigma $ and c with the SIGMA= and C= Weibull-options. By default, maximum likelihood estimates are computed for c and $\sigma $. For example, the following statements fit a Weibull distribution with $\theta =15$ and with maximum likelihood estimates for $\sigma $ and c:

proc capability;
   histogram length / weibull(theta=15);
run;

Note that the maximum likelihood estimate of c is calculated iteratively using the Newton-Raphson approximation. The CDELTA=, CINITIAL=, and MAXITER= Weibull-options control the approximation.

The WEIBULL option can appear only once in a HISTOGRAM statement. Table 5.19 lists secondary options that you can specify with the WEIBULL option. See Example 5.9 and Formulas for Fitted Curves.

WGRID=n

Traditional Graphicsspecifies the width of the grid lines requested with the GRID option. By default, grid lines are the same width as the axes. If you use the WGRID= option, you do not need to specify the GRID option.

ZETA=value-list

specifies a value for the scale parameter $\zeta $ for lognormal density curves requested with the LOGNORMAL option. Enclose the ZETA= option in parentheses after the LOGNORMAL option. By default, the procedure calculates a maximum likelihood estimate for $\zeta $. You can specify the SCALE= option as an alias for the ZETA= option.