HISTOGRAM Statement |
The HISTOGRAM statement creates histograms and optionally superimposes estimated parametric and nonparametric probability density curves. You cannot use the WEIGHT statement with the HISTOGRAM statement. You can use any number of HISTOGRAM statements after a PROC UNIVARIATE statement. The components of the HISTOGRAM statement are follows.
are the variables for which histograms are to be created. If you specify a VAR statement, the variables must also be listed in the VAR statement. Otherwise, the variables can be any numeric variables in the input data set. If you do not specify variables in a VAR statement or in the HISTOGRAM statement, then by default, a histogram is created for each numeric variable in the DATA= data set. If you use a VAR statement and do not specify any variables in the HISTOGRAM statement, then by default, a histogram is created for each variable listed in the VAR statement.
For example, suppose a data set named Steel contains exactly two numeric variables named Length and Width. The following statements create two histograms, one for Length and one for Width:
proc univariate data=Steel; histogram; run;
Likewise, the following statements create histograms for Length and Width:
proc univariate data=Steel; var Length Width; histogram; run;
The following statements create a histogram for Length only:
proc univariate data=Steel; var Length Width; histogram Length; run;
add features to the histogram. Specify all options after the slash (/) in the HISTOGRAM statement. Options can be one of the following:
primary options for fitted parametric distributions and kernel density estimates
secondary options for fitted parametric distributions and kernel density estimates
general options for graphics and output data sets
For example, in the following statements, the NORMAL option displays a fitted normal curve on the histogram, the MIDPOINTS= option specifies midpoints for the histogram, and the CTEXT= option specifies the color of the text:
proc univariate data=Steel; histogram Length / normal midpoints = 5.6 5.8 6.0 6.2 6.4 ctext = blue; run;
Table 4.16 through Table 4.33 list the HISTOGRAM options by function. For complete descriptions, see the sections Dictionary of Options and Dictionary of Common Options.
Table 4.16 lists primary options that display parametric density estimates on the histogram. You can specify each primary option once in a given HISTOGRAM statement, and each primary option can display multiple curves from its family on the histogram.
Option |
Description |
---|---|
fits beta distribution with threshold parameter , scale parameter , and shape parameters and |
|
fits exponential distribution with threshold parameter and scale parameter |
|
fits gamma distribution with threshold parameter , scale parameter , and shape parameter |
|
fits gumbel distribution with location parameter , and scale parameter |
|
fits inverse Gaussian distribution with tlocation parameter , and shape parameter |
|
fits lognormal distribution with threshold parameter , scale parameter , and shape parameter |
|
fits normal distribution with mean and standard deviation |
|
fits generalized Pareto distribution with threshold parameter , scale parameter , and shape parameter |
|
fits power function distribution with threshold parameter , scale parameter , and shape parameter |
|
fits Rayleigh distribution with threshold parameter , and scale parameter |
|
fits Johnson distribution with threshold parameter , scale parameter , and shape parameters and |
|
fits Johnson distribution with threshold parameter , scale parameter , and shape parameters and |
|
fits Weibull distribution with threshold parameter , scale parameter , and shape parameter |
Table 4.17 through Table 4.30 list secondary options that specify parameters for fitted parametric distributions and that control the display of fitted curves. Specify these secondary options in parentheses after the primary distribution option. For example, you can fit a normal curve by specifying the NORMAL option as follows:
proc univariate; histogram / normal(color=red mu=10 sigma=0.5); run;
The COLOR= normal-option draws the curve in red, and the MU= and SIGMA= normal-options specify the parameters and for the curve. Note that the sample mean and sample standard deviation are used to estimate and , respectively, when the MU= and SIGMA= normal-options are not specified.
You can specify lists of values for secondary options to display more than one fitted curve from the same distribution family on a histogram. Option values are matched by list position. You can specify the value EST in a list of distribution parameter values to use an estimate of the parameter.
For example, the following code displays two normal curves on a histogram:
proc univariate; histogram / normal(color=(red blue) mu=10 est sigma=0.5 est); run;
The first curve is red, with and . The second curve is blue, with equal to the sample mean and equal to the sample standard deviation.
See the section Formulas for Fitted Continuous Distributions for detailed information about the families of parametric distributions that you can fit with the HISTOGRAM statement.
Option |
Description |
---|---|
specifies colors of density curves |
|
specifies table of contents entry for density curve grouping |
|
fills area under density curve |
|
specifies line types of density curves |
|
prints table of midpoints of histogram intervals |
|
suppresses tables summarizing curves |
|
lists percents for which quantiles calculated from data and quantiles estimated from curves are tabulated |
|
specifies widths of density curves |
Option |
Description |
---|---|
specifies first shape parameter for beta curve |
|
specifies second shape parameter for beta curve |
|
specifies scale parameter for beta curve |
|
specifies lower threshold parameter for beta curve |
Option |
Description |
---|---|
specifies scale parameter for exponential curve |
|
specifies threshold parameter for exponential curve |
Option |
Description |
---|---|
specifies shape parameter for gamma curve |
|
specifies change in successive estimates of at which the Newton-Raphson approximation of terminates |
|
specifies initial value for in the Newton-Raphson approximation of |
|
specifies maximum number of iterations in the Newton-Raphson approximation of |
|
specifies scale parameter for gamma curve |
|
specifies threshold parameter for gamma curve |
Option |
Description |
---|---|
specifies number of samples for EDF goodness-of-fit simulation |
|
specifies seed value for EDF goodness-of-fit simulation |
|
specifies location parameter for gumbel curve |
|
specifies scale parameter for gumbel curve |
Option |
Description |
---|---|
specifies number of samples for EDF goodness-of-fit simulation |
|
specifies seed value for EDF goodness-of-fit simulation |
|
specifies shape parameter for inverse Gaussian curve |
|
specifies location parameter for inverse Gaussian curve |
Option |
Description |
---|---|
specifies shape parameter for lognormal curve |
|
specifies threshold parameter for lognormal curve |
|
specifies scale parameter for lognormal curve |
Option |
Description |
---|---|
specifies mean for normal curve |
|
specifies standard deviation for normal curve |
Option |
Description |
---|---|
specifies number of samples for EDF goodness-of-fit simulation |
|
specifies seed value for EDF goodness-of-fit simulation |
|
specifies shape parameter for generalized Pareto curve |
|
specifies scale parameter for generalized Pareto curve |
|
specifies threshold parameter for generalized Pareto curve |
Option |
Description |
---|---|
specifies shape parameter for power function curve |
|
specifies scale parameter for power function curve |
|
specifies threshold parameter for power function curve |
Option |
Description |
---|---|
specifies number of samples for EDF goodness-of-fit simulation |
|
specifies seed value for EDF goodness-of-fit simulation |
|
specifies scale parameter for Rayleigh curve |
|
specifies threshold parameter for Rayleigh curve |
Option |
Description |
---|---|
specifies first shape parameter for Johnson curve |
|
specifies -value for method of percentiles |
|
specifies method of parameter estimation |
|
specifies tolerance for method of percentiles |
|
specifies second shape parameter for Johnson curve |
|
specifies scale parameter for Johnson curve |
|
specifies lower threshold parameter for Johnson curve |
Option |
Description |
---|---|
specifies first shape parameter for Johnson curve |
|
specifies -value for method of percentiles |
|
specifies method of parameter estimation |
|
specifies tolerance for method of percentiles |
|
specifies second shape parameter for Johnson curve |
|
specifies scale parameter for Johnson curve |
|
specifies lower threshold parameter for Johnson curve |
Option |
Description |
---|---|
specifies shape parameter c for Weibull curve |
|
specifies change in successive estimates of at which the Newton-Raphson approximation of terminates |
|
specifies initial value for in the Newton-Raphson approximation of |
|
specifies maximum number of iterations in the Newton-Raphson approximation of |
|
specifies scale parameter for Weibull curve |
|
specifies threshold parameter for Weibull curve |
Use the option KERNEL(kernel-options) to compute kernel density estimates. Specify the following secondary options in parentheses after the KERNEL option to control features of density estimates requested with the KERNEL option.
Option |
Description |
---|---|
specifies standardized bandwidth parameter |
|
specifies color of the kernel density curve |
|
fills area under kernel density curve |
|
specifies type of kernel function |
|
specifies line type used for kernel density curve |
|
specifies lower bound for kernel density curve |
|
specifies upper bound for kernel density curve |
|
specifies line width for kernel density curve |
Table 4.32 summarizes options for enhancing histograms, and Table 4.33 summarizes options for requesting output data sets.
Option |
Description |
---|---|
applies annotation requested in ANNOTATE= data set to key cell only |
|
specifies annotate data set |
|
produces labels above histogram bars |
|
specifies width for the bars |
|
specifies color for axis |
|
specifies color for outlines of histogram bars |
|
specifies color for filling under curve |
|
specifies color for frame |
|
specifies color for filling frame for row labels |
|
specifies color for filling frame for column labels |
|
specifies color for grid lines |
|
specifies color for HREF= lines |
|
draws reference lines behind histogram bars |
|
specifies table of contents entry for histogram grouping |
|
specifies color for proportion of frequency bar |
|
specifies color for text |
|
specifies color for row labels of comparative histograms |
|
specifies color for column labels of comparative histograms |
|
specifies color for VREF= lines |
|
specifies description for plot in graphics catalog |
|
lists endpoints for histogram intervals |
|
specifies software font for text |
|
draws reference lines in front of histogram bars |
|
creates a grid |
|
constructs hanging histogram |
|
specifies AXIS statement for horizontal axis |
|
specifies height of text used outside framed areas |
|
specifies number of horizontal minor tick marks |
|
specifies offset for horizontal axis |
|
specifies reference lines perpendicular to the horizontal axis |
|
specifies labels for HREF= lines |
|
specifies vertical position of labels for HREF= lines |
|
specifies software font for text inside framed areas |
|
specifies height of text inside framed areas |
|
specifies space between histogram bars |
|
specifies distance between tiles |
|
specifies a line type for grid lines |
|
specifies line style for HREF= lines |
|
specifies line style for VREF= lines |
|
specifies maximum number of bins to display |
|
limits the number of bins that display to within a specified number of standard deviations above and below mean of data in key cell |
|
specifies midpoints for histogram intervals |
|
specifies name for plot in graphics catalog |
|
specifies number of columns in comparative histogram |
|
specifies number of histogram interval endpoints |
|
specifies number of histogram interval midpoints |
|
suppresses histogram bars |
|
suppresses frame around plotting area |
|
suppresses label for horizontal axis |
|
suppresses plot |
|
suppresses table of contents entries for tables produced by HISTOGRAM statement |
|
suppresses label for vertical axis |
|
suppresses tick marks and tick mark labels for vertical axis |
|
specifies number of rows in comparative histogram |
|
specifies pattern for filling under curve |
|
includes right endpoint in interval |
|
turns and vertically strings out characters in labels for vertical axis |
|
specifies AXIS statement or values for vertical axis |
|
specifies label for vertical axis |
|
specifies number of vertical minor tick marks |
|
specifies length of offset at upper end of vertical axis |
|
specifies reference lines perpendicular to the vertical axis |
|
specifies labels for VREF= lines |
|
specifies horizontal position of labels for VREF= lines |
|
specifies scale for vertical axis |
|
specifies line thickness for axes and frame |
|
specifies line thickness for bar outlines |
|
specifies line thickness for grid |
Option |
Description |
---|---|
creates table of histogram intervals |
|
specifies information about histogram intervals |
|
creates a data set containing kernel density estimates |
The following entries provide detailed descriptions of options in the HISTOGRAM statement. Options marked with † are applicable only when traditional graphics are produced. See the section Dictionary of Common Options for detailed descriptions of options common to all plot statements.
specifies the shape parameter for fitted curves requested with the BETA, GAMMA, PARETO and POWER options. Enclose the ALPHA= option in parentheses after the distribution keyword. By default, or if you specify the value EST, the procedure calculates a maximum likelihood estimate for . You can specify A= as an alias for ALPHA= if you use it as a beta-option. You can specify SHAPE= as an alias for ALPHA= if you use it as a gamma-option.
displays labels above the histogram bars. If you specify BARLABEL=COUNT, the label shows the number of observations associated with a given bar. If you specify BARLABEL=PERCENT, the label shows the percentage of observations represented by that bar. If you specify BARLABEL=PROPORTION, the label displays the proportion of observations associated with the bar.
specifies the width of the histogram bars in percentage screen units. If both the BARWIDTH= and INTERBAR= options are specified, the INTERBAR= option takes precedence.
displays fitted beta density curves on the histogram. The BETA option can occur only once in a HISTOGRAM statement, but it can request any number of beta curves. The beta distribution is bounded below by the parameter and above by the value . Use the THETA= and SIGMA= beta-options to specify these parameters. By default, THETA=0 and SIGMA=1. You can specify THETA=EST and SIGMA=EST to request maximum likelihood estimates for and .
The beta distribution has two shape parameters: and . If these parameters are known, you can specify their values with the ALPHA= and BETA= beta-options. By default, the procedure computes maximum likelihood estimates for and . Note: Three- and four-parameter maximum likelihood estimation may not always converge.
Table 4.17 and Table 4.18 list secondary options you can specify with the BETA option. See the section Beta Distribution for details and Example 4.21 for an example that uses the BETA option.
specifies the second shape parameter for beta density curves requested with the BETA option. Enclose the BETA= option in parentheses after the BETA option. By default, or if you specify the value EST, the procedure calculates a maximum likelihood estimate for .
specifies the shape parameter for Weibull density curves requested with the WEIBULL option. Enclose the C= Weibull-option in parentheses after the WEIBULL option. By default, or if you specify the value EST, the procedure calculates a maximum likelihood estimate for . You can specify the SHAPE= Weibull-option as an alias for the C= Weibull-option.
specifies the standardized bandwidth parameter for kernel density estimates requested with the KERNEL option. Enclose the C= kernel-option in parentheses after the KERNEL option. You can specify a list of values to request multiple estimates. You can specify the value MISE to produce the estimate with a bandwidth that minimizes the approximate mean integrated square error (MISE), or SJPI to select the bandwidth by using the Sheather-Jones plug-in method.
You can also use the C= kernel-option with the K= kernel-option (which specifies the kernel function) to compute multiple estimates. If you specify more kernel functions than bandwidths, the last bandwidth in the list is repeated for the remaining estimates. Similarly, if you specify more bandwidths than kernel functions, the last kernel function is repeated for the remaining estimates. If you do not specify the C= kernel-option, the bandwidth that minimizes the approximate MISE is used for all the estimates.
See the section Kernel Density Estimates for more information about kernel density estimates.
specifies the color for the outline of the histogram bars when producing traditional graphics. The option does not apply to ODS Graphics output.
specifies the color to fill the bars of the histogram (or the area under a fitted density curve if you also specify the FILL option) when producing traditional graphics. See the entries for the FILL and PFILL= options for additional details. Refer to SAS/GRAPH: Reference for a list of colors. The option does not apply to ODS Graphics output.
specifies the color for grid lines when a grid displays on the histogram in traditional graphics. This option also produces a grid if the GRID= option is not specified.
draws reference lines requested with the HREF= and VREF= options behind the histogram bars. When the GSTYLE system option is in effect for traditional graphics, reference lines are drawn in front of the bars by default.
specifies the table of contents grouping entry for tables associated with a density curve. Enclose the CONTENTS= option in parentheses after the distribution option. You can specify CONTENTS='' to suppress the grouping entry.
specifies the first shape parameter for Johnson and Johnson distribution functions requested with the SB and SU options. Enclose the DELTA= option in parentheses after the SB or SU option. If you do not specify a value for , or if you specify the value EST, the procedure calculates an estimate.
specifies the number of simulation samples used to compute -values for EDF goodness-of-fit statistics for density curves requested with the GUMBEL, IGAUSS, PARETO, and RAYLEIGH options. Enclose the EDFNSAMPLES= option in parentheses after the distribution option. The default value is 500.
specifies an integer value used to start the pseudo-random number generator when creating simulation samples for computing EDF goodness-of-fit statistic -values for density curves requested with the GUMBEL, IGAUSS, PARETO, and RAYLEIGH options. Enclose the EDFNSEED= option in parentheses after the distribution option. By default, the procedure uses a random number seed generated from reading the time of day from the computer’s clock.
uses histogram bin endpoints as the tick mark values for the horizontal axis and determines how to compute the bin width of the histogram bars. The values specify both the left and right endpoint of each histogram interval. The width of the histogram bars is the difference between consecutive endpoints. The procedure uses the same values for all variables.
The range of endpoints must cover the range of the data. For example, if you specify
endpoints=2 to 10 by 2
then all of the observations must fall in the intervals [2,4) [4,6) [6,8) [8,10]. You also must use evenly spaced endpoints which you list in increasing order.
determines the endpoints for the data in the key cell. The initial number of endpoints is based on the number of observations in the key cell by using the method of Terrell and Scott (1985). The procedure extends the endpoint list for the key cell in either direction as necessary until it spans the data in the remaining cells.
determines the endpoints by using all the observations as if there were no cells. In other words, the number of endpoints is based on the total sample size by using the method of Terrell and Scott (1985).
Neither KEY nor UNIFORM apply unless you use the CLASS statement.
If you omit ENDPOINTS, the procedure uses the histogram midpoints as horizontal axis tick values. If you specify ENDPOINTS, the procedure computes the endpoints by using an algorithm (Terrell and Scott; 1985) that is primarily applicable to continuous data that are approximately normally distributed.
If you specify both MIDPOINTS= and ENDPOINTS, the procedure issues a warning message and uses the endpoints.
If you specify RTINCLUDE, the procedure includes the right endpoint of each histogram interval in that interval instead of including the left endpoint.
If you use a CLASS statement and specify ENDPOINTS, the procedure uses ENDPOINTS=KEY as the default. However if the key cell is empty, then the procedure uses ENDPOINTS=UNIFORM.
displays fitted exponential density curves on the histogram. The EXPONENTIAL option can occur only once in a HISTOGRAM statement, but it can request any number of exponential curves. The parameter must be less than or equal to the minimum data value. Use the THETA= exponential-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the SIGMA= exponential-option to specify . By default, the procedure computes a maximum likelihood estimate for . Table 4.17 and Table 4.19 list options you can specify with the EXPONENTIAL option. See the section Exponential Distribution for details.
fills areas under the fitted density curve or the kernel density estimate with colors and patterns. The FILL option can occur with only one fitted curve. Enclose the FILL option in parentheses after a density curve option or the KERNEL option. The CFILL= and PFILL= options specify the color and pattern for the area under the curve when producing traditional graphics. For a list of available colors and patterns, see SAS/GRAPH: Reference.
draws reference lines requested with the HREF= and VREF= options in front of the histogram bars. When the NOGSTYLE system option is in effect for traditional graphics, reference lines are drawn behind the histogram bars by default, and they can be obscured by filled bars.
displays fitted gamma density curves on the histogram. The GAMMA option can occur only once in a HISTOGRAM statement, but it can request any number of gamma curves. The parameter must be less than the minimum data value. Use the THETA= gamma-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the ALPHA= and the SIGMA= gamma-options to specify the shape parameter and the scale parameter . By default, PROC UNIVARIATE computes maximum likelihood estimates for and . The procedure calculates the maximum likelihood estimate of iteratively by using the Newton-Raphson approximation. Table 4.17 and Table 4.20 list options you can specify with the GAMMA option. See the section Gamma Distribution for details, and see Example 4.22 for an example that uses the GAMMA option.
specifies the second shape parameter for Johnson and Johnson distribution functions requested with the SB and SU options. Enclose the GAMMA= option in parentheses after the SB or SU option. If you do not specify a value for , or if you specify the value EST, the procedure calculates an estimate.
displays a grid on the histogram. Grid lines are horizontal lines that are positioned at major tick marks on the vertical axis.
displays fitted Gumbel density curves on the histogram. The GUMBEL option can occur only once in a HISTOGRAM statement, but it can request any number of Gumbel curves. Use the MU= and the SIGMA= Gumbel-options to specify the location parameter and the scale parameter . By default, PROC UNIVARIATE computes maximum likelihood estimates for and . Table 4.17 and Table 4.21 list options you can specify with the GUMBEL option. See the section Gumbel Distribution for details about the Gumbel distribution.
requests a hanging histogram, as illustrated in Figure 4.7.
You can use the HANGING option only when exactly one fitted density curve is requested. A hanging histogram aligns the tops of the histogram bars (displayed as lines) with the fitted curve. The lines are positioned at the midpoints of the histogram bins. A hanging histogram is a goodness-of-fit diagnostic in the sense that the closer the lines are to the horizontal axis, the better the fit. Hanging histograms are discussed by Tukey (1977), Wainer (1974), and Velleman and Hoaglin (1981).
specifies the offset, in percentage screen units, at both ends of the horizontal axis. You can use HOFFSET=0 to eliminate the default offset.
displays fitted inverse Gaussian density curves on the histogram. The IGAUSS option can occur only once in a HISTOGRAM statement, but it can request any number of inverse Gaussian curves. Use the MU= and the LAMBDA= iGauss-options to specify the location parameter and the shape parameter . By default, PROC UNIVARIATE uses the sample mean for and computes a maximum likelihood estimate for . Table 4.17 and Table 4.22 list options you can specify with the IGAUSS option. See the section Inverse Gaussian Distribution for details.
specifies the space between histogram bars in percentage screen units. If both the INTERBAR= and BARWIDTH= options are specified, the INTERBAR= option takes precedence.
specifies the kernel function (normal, quadratic, or triangular) used to compute a kernel density estimate. You can specify a list of values to request multiple estimates. You must enclose this option in parentheses after the KERNEL option. You can also use the K= kernel-option with the C= kernel-option, which specifies standardized bandwidths. If you specify more kernel functions than bandwidths, the procedure repeats the last bandwidth in the list for the remaining estimates. Similarly, if you specify more bandwidths than kernel functions, the procedure repeats the last kernel function for the remaining estimates. By default, K=NORMAL.
superimposes kernel density estimates on the histogram. By default, the procedure uses the AMISE method to compute kernel density estimates. To request multiple kernel density estimates on the same histogram, specify a list of values for the C= kernel-option or K= kernel-option. Table 4.31 lists options you can specify with the KERNEL option. See the section Kernel Density Estimates for more information about kernel density estimates, and see Example 4.23.
specifies the shape parameter for fitted curves requested with the IGAUSS option. Enclose the LAMBDA= option in parentheses after the IGAUSS distribution keyword. If you do not specify a value for , the procedure calculates a maximum likelihood estimate.
specifies the line type for the grid when a grid displays on the histogram. This option also creates a grid if the GRID option is not specified.
displays fitted lognormal density curves on the histogram. The LOGNORMAL option can occur only once in a HISTOGRAM statement, but it can request any number of lognormal curves. The parameter must be less than the minimum data value. Use the THETA= lognormal-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the SIGMA= and ZETA= lognormal-options to specify and . By default, the procedure computes maximum likelihood estimates for and . Table 4.17 and Table 4.23 list options you can specify with the LOGNORMAL option. See the section Lognormal Distribution for details, and see Example 4.22 and Example 4.24 for examples using the LOGNORMAL option.
specifies lower bounds for kernel density estimates requested with the KERNEL option. Enclose the LOWER= option in parentheses after the KERNEL option. If you specify more kernel estimates than lower bounds, the last lower bound is repeated for the remaining estimates. The default is a missing value, indicating no lower bounds for fitted kernel density curves.
limits the number of bins displayed in the comparative histogram. This option is useful when the scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number of bins can be so great that each cell histogram is scaled into a narrow region. By using MAXNBIN= to limit the number of bins, you can narrow the window about the data distribution in the key cell. This option is not available unless you specify the CLASS statement. The MAXNBIN= option is an alternative to the MAXSIGMAS= option.
limits the number of bins displayed in the comparative histogram to a range of value standard deviations (of the data in the key cell) above and below the mean of the data in the key cell. This option is useful when the scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number of bins can be so great that each cell histogram is scaled into a narrow region. By using MAXSIGMAS= to limit the number of bins, you can narrow the window that surrounds the data distribution in the key cell. This option is not available unless you specify the CLASS statement.
requests a table listing the midpoints and percentage of observations in each histogram interval. If you specify MIDPERCENTS in parentheses after a density estimate option, the procedure displays a table that lists the midpoints, the observed percentage of observations, and the estimated percentage of the population in each interval (estimated from the fitted distribution). See Example 4.18.
specifies how to determine the midpoints for the histogram intervals, where values determines the width of the histogram bars as the difference between consecutive midpoints. The procedure uses the same values for all variables.
The range of midpoints, extended at each end by half of the bar width, must cover the range of the data. For example, if you specify
midpoints=2 to 10 by 0.5
then all of the observations should fall between 1.75 and 10.25. You must use evenly spaced midpoints listed in increasing order.
determines the midpoints for the data in the key cell. The initial number of midpoints is based on the number of observations in the key cell that use the method of Terrell and Scott (1985). The procedure extends the midpoint list for the key cell in either direction as necessary until it spans the data in the remaining cells.
determines the midpoints by using all the observations as if there were no cells. In other words, the number of midpoints is based on the total sample size by using the method of Terrell and Scott (1985).
Neither KEY nor UNIFORM apply unless you use the CLASS statement. By default, if you use a CLASS statement, MIDPOINTS=KEY; however, if the key cell is empty then MIDPOINTS=UNIFORM. Otherwise, the procedure computes the midpoints by using an algorithm (Terrell and Scott; 1985) that is primarily applicable to continuous data that are approximately normally distributed.
specifies the parameter for Gumbel, inverse Gaussian, and normal density curves requested with the GUMBEL, IGAUSS, and NORMAL options, respectively. Enclose the MU= option in parentheses after the distribution keyword. By default, or if you specify the value EST, the procedure uses the sample mean for for normal and inverse Gaussian distributions and computes a maximum likelihood estimate of for the Gumbel distribution. For more detail please see Inverse Gaussian Distribution and Gumbel Distribution.
uses histogram interval endpoints as the tick mark values for the horizontal axis and determines the number of bins.
specifies the number of histogram intervals.
suppresses drawing of histogram bars, which is useful for viewing fitted curves only.
suppresses the creation of a plot. Use this option when you only want to tabulate summary statistics for a fitted density or create an OUTHISTOGRAM= data set.
suppresses tables summarizing the fitted curve. Enclose the NOPRINT option in parentheses following the distribution option.
displays fitted normal density curves on the histogram. The NORMAL option can occur only once in a HISTOGRAM statement, but it can request any number of normal curves. Use the MU= and SIGMA= normal-options to specify and . By default, the procedure uses the sample mean and sample standard deviation for and . Table 4.17 and Table 4.24 list options you can specify with the NORMAL option. See the section Normal Distribution for details, and see Example 4.19 for an example that uses the NORMAL option.
suppresses the table of contents entries for tables produced by the HISTOGRAM statement.
creates a SAS data set that contains information about histogram intervals. Specifically, the data set contains the midpoints of the histogram intervals (or the lower endpoints of the intervals if you specify the ENDPOINTS option), the observed percentage of observations in each interval, and the estimated percentage of observations in each interval (estimated from each of the specified fitted curves).
displays fitted generalized Pareto density curves on the histogram. The PARETO option can occur only once in a HISTOGRAM statement, but it can request any number of generalized Pareto curves. The parameter must be less than the minimum data value. Use the THETA= Pareto-option to specify . By default, THETA=0. Use the SIGMA= and the ALPHA= Pareto-options to specify the scale parameter and the shape parameter . By default, PROC UNIVARIATE computes maximum likelihood estimates for and . Table 4.17 and Table 4.25 list options you can specify with the PARETO option. See the section Generalized Pareto Distribution for details.
specifies a list of percents for which quantiles calculated from the data and quantiles estimated from the fitted curve are tabulated. The percents must be between 0 and 100. Enclose the PERCENTS= option in parentheses after the curve option. The default percents are 1, 5, 10, 25, 50, 75, 90, 95, and 99.
specifies a pattern used to fill the bars of the histograms (or the areas under a fitted curve if you also specify the FILL option) when producing traditional graphics. See the entries for the CFILL= and FILL options for additional details. Refer to SAS/GRAPH: Reference for a list of pattern values. The option does not apply to ODS Graphics output.
displays fitted power function density curves on the histogram. The POWER option can occur only once in a HISTOGRAM statement, but it can request any number of power function curves. The parameter must be less than the minimum data value. Use the THETA= and SIGMA= power-options to specify and . The default values are 0 and 1, respectively. Use the ALPHA= power-option to specify the and the shape parameter, . By default, PROC UNIVARIATE computes a maximum likelihood estimate for . Table 4.17 and Table 4.26 list options you can specify with the POWER option. See the section Power Function Distribution for details.
displays fitted Rayleigh density curves on the histogram. The RAYLEIGH option can occur only once in a HISTOGRAM statement, but it can request any number of Rayleigh curves. The parameter must be less than the minimum data value. Use the THETA= Rayleigh-option to specify . By default, THETA=0. Use the SIGMA= Rayleigh-option to specify the scale parameter . By default, PROC UNIVARIATE computes maximum likelihood estimates for . Table 4.17 and Table 4.27 list options you can specify with the RAYLEIGH option. See the section Rayleigh Distribution for details.
includes the right endpoint of each histogram interval in that interval. By default, the left endpoint is included in the histogram interval.
displays fitted Johnson density curves on the histogram. The SB option can occur only once in a HISTOGRAM statement, but it can request any number of Johnson curves. Use the THETA= and SIGMA= normal-options to specify and . By default, the procedure computes maximum likelihood estimates of and . Table 4.17 and Table 4.28 list options you can specify with the SB option. See the section Johnson Distribution for details.
specifies the parameter for the fitted density curve when you request the BETA, EXPONENTIAL, GAMMA, GUMBEL, LOGNORMAL, NORMAL, PARETO, POWER, RAYLEIGH, SB, SU, or WEIBULL options.
See Table 4.34 for a summary of how to use the SIGMA= option. You must enclose this option in parentheses after the density curve option. You can specify the value EST to request a maximum likelihood estimate for .
Distribution Keyword |
SIGMA= Specifies |
Default Value |
Alias |
---|---|---|---|
BETA |
scale parameter |
1 |
SCALE= |
EXPONENTIAL |
scale parameter |
maximum likelihood estimate |
SCALE= |
GAMMA |
scale parameter |
maximum likelihood estimate |
SCALE= |
GUMBEL |
scale parameter |
maximum likelihood estimate |
|
LOGNORMAL |
shape parameter |
maximum likelihood estimate |
SHAPE= |
NORMAL |
scale parameter |
standard deviation |
|
PARETO |
scale parameter |
1 |
|
POWER |
scale parameter |
maximum likelihood estimate |
|
RAYLEIGH |
scale parameter |
maximum likelihood estimate |
|
SB |
scale parameter |
1 |
SCALE= |
SU |
scale parameter |
percentile-based estimate |
|
WEIBULL |
scale parameter |
maximum likelihood estimate |
SCALE= |
displays fitted Johnson density curves on the histogram. The SU option can occur only once in a HISTOGRAM statement, but it can request any number of Johnson curves. Use the THETA= and SIGMA= normal-options to specify and . By default, the procedure computes maximum likelihood estimates of and . Table 4.17 and Table 4.29 list options you can specify with the SU option. See the section Johnson Distribution for details.
specifies the lower threshold parameter for curves requested with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, PARETO, POWER, RAYLEIGH, SB, SU, and WEIBULL options. Enclose the THETA= option in parentheses after the curve option. By default, THETA=0. If you specify the value EST, an estimate is computed for .
specifies upper bounds for kernel density estimates requested with the KERNEL option. Enclose the UPPER= option in parentheses after the KERNEL option. If you specify more kernel estimates than upper bounds, the last upper bound is repeated for the remaining estimates. The default is a missing value, indicating no upper bounds for fitted kernel density curves.
specifies the offset, in percentage screen units, at the upper end of the vertical axis.
specifies the scale of the vertical axis for a histogram. The value COUNT requests the data be scaled in units of the number of observations per data unit. The value PERCENT requests the data be scaled in units of percent of observations per data unit. The value PROPORTION requests the data be scaled in units of proportion of observations per data unit. The default is PERCENT.
specifies the width of bar outlines when producing traditional graphics. The option does not apply to ODS Graphics output.
displays fitted Weibull density curves on the histogram. The WEIBULL option can occur only once in a HISTOGRAM statement, but it can request any number of Weibull curves. The parameter must be less than the minimum data value. Use the THETA= Weibull-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the C= and SIGMA= Weibull-options to specify the shape parameter and the scale parameter . By default, the procedure computes the maximum likelihood estimates for and . Table 4.17 and Table 4.30 list options you can specify with the WEIBULL option. See the section Weibull Distribution for details, and see Example 4.22 for an example that uses the WEIBULL option.
PROC UNIVARIATE calculates the maximum likelihood estimate of iteratively by using the Newton-Raphson approximation. See also the C=, SIGMA=, and THETA= Weibull-options.
specifies the line thickness for the grid when producing traditional graphics. The option does not apply to ODS Graphics output.
specifies a value for the scale parameter for lognormal density curves requested with the LOGNORMAL option. Enclose the ZETA= lognormal-option in parentheses after the LOGNORMAL option. By default, or if you specify the value EST, the procedure calculates a maximum likelihood estimate for . You can specify the SCALE= option as an alias for the ZETA= option.