The HISTOGRAM statement creates histograms and optionally superimposes estimated parametric and nonparametric probability density curves. You cannot use the WEIGHT statement with the HISTOGRAM statement. You can use any number of HISTOGRAM statements after a PROC UNIVARIATE statement. The components of the HISTOGRAM statement are follows.
Table 4.5 lists primary options that display parametric density estimates on the histogram. You can specify each primary option once in a given HISTOGRAM statement, and each primary option can display multiple curves from its family on the histogram.
Table 4.5: Primary Options for Parametric Fitted Distribution
Option |
Description |
---|---|
fits beta distribution with threshold parameter , scale parameter , and shape parameters and |
|
fits exponential distribution with threshold parameter and scale parameter |
|
fits gamma distribution with threshold parameter , scale parameter , and shape parameter |
|
fits gumbel distribution with location parameter , and scale parameter |
|
fits inverse Gaussian distribution with location parameter , and shape parameter |
|
fits lognormal distribution with threshold parameter , scale parameter , and shape parameter |
|
fits normal distribution with mean and standard deviation |
|
fits generalized Pareto distribution with threshold parameter , scale parameter , and shape parameter |
|
fits power function distribution with threshold parameter , scale parameter , and shape parameter |
|
fits Rayleigh distribution with threshold parameter , and scale parameter |
|
fits Johnson distribution with threshold parameter , scale parameter , and shape parameters and |
|
fits Johnson distribution with threshold parameter , scale parameter , and shape parameters and |
|
fits Weibull distribution with threshold parameter , scale parameter , and shape parameter c |
Table 4.6 lists secondary options that specify parameters for fitted parametric distributions and that control the display of fitted curves. Specify these secondary options in parentheses after the primary distribution option. For example, you can fit a normal curve by specifying the NORMAL option as follows:
proc univariate; histogram / normal(color=red mu=10 sigma=0.5); run;
The COLOR= normal-option draws the curve in red, and the MU= and SIGMA= normal-options specify the parameters and for the curve. Note that the sample mean and sample standard deviation are used to estimate and , respectively, when the MU= and SIGMA= normal-options are not specified.
You can specify lists of values for secondary options to display more than one fitted curve from the same distribution family on a histogram. Option values are matched by list position. You can specify the value EST in a list of distribution parameter values to use an estimate of the parameter.
For example, the following code displays two normal curves on a histogram:
proc univariate; histogram / normal(color=(red blue) mu=10 est sigma=0.5 est); run;
The first curve is red, with and . The second curve is blue, with equal to the sample mean and equal to the sample standard deviation.
See the section Formulas for Fitted Continuous Distributions for detailed information about the families of parametric distributions that you can fit with the HISTOGRAM statement.
Table 4.6: Secondary Options for Parametric Distributions
Option |
Description |
---|---|
Options Used with All Distributions |
|
specifies colors of density curves |
|
specifies table of contents entry for density curve grouping |
|
fills area under density curve |
|
specifies line types of density curves |
|
prints table of midpoints of histogram intervals |
|
suppresses tables summarizing curves |
|
lists percents for which quantiles calculated from data and quantiles estimated from curves are tabulated |
|
specifies widths of density curves |
|
Beta-Options |
|
specifies first shape parameter for beta curve |
|
specifies second shape parameter for beta curve |
|
specifies scale parameter for beta curve |
|
specifies lower threshold parameter for beta curve |
|
Exponential-Options |
|
specifies scale parameter for exponential curve |
|
specifies threshold parameter for exponential curve |
|
Gamma-Options |
|
specifies shape parameter for gamma curve |
|
specifies change in successive estimates of at which the Newton-Raphson approximation of terminates |
|
specifies initial value for in the Newton-Raphson approximation of |
|
specifies maximum number of iterations in the Newton-Raphson approximation of |
|
specifies scale parameter for gamma curve |
|
specifies threshold parameter for gamma curve |
|
Gumbel-Options |
|
specifies number of samples for EDF goodness-of-fit simulation |
|
specifies seed value for EDF goodness-of-fit simulation |
|
specifies location parameter for gumbel curve |
|
specifies scale parameter for gumbel curve |
|
IGauss-Options |
|
specifies number of samples for EDF goodness-of-fit simulation |
|
specifies seed value for EDF goodness-of-fit simulation |
|
specifies shape parameter for inverse Gaussian curve |
|
specifies location parameter for inverse Gaussian curve |
|
Lognormal-Options |
|
specifies shape parameter for lognormal curve |
|
specifies threshold parameter for lognormal curve |
|
specifies scale parameter for lognormal curve |
|
Normal-Options |
|
specifies mean for normal curve |
|
specifies standard deviation for normal curve |
|
Pareto-Options |
|
specifies number of samples for EDF goodness-of-fit simulation |
|
specifies seed value for EDF goodness-of-fit simulation |
|
specifies shape parameter for generalized Pareto curve |
|
specifies scale parameter for generalized Pareto curve |
|
specifies threshold parameter for generalized Pareto curve |
|
Power-Options |
|
specifies shape parameter for power function curve |
|
specifies scale parameter for power function curve |
|
specifies threshold parameter for power function curve |
|
Rayleigh-Options |
|
specifies number of samples for EDF goodness-of-fit simulation |
|
specifies seed value for EDF goodness-of-fit simulation |
|
specifies scale parameter for Rayleigh curve |
|
specifies threshold parameter for Rayleigh curve |
|
Johnson -Options |
|
specifies first shape parameter for Johnson curve |
|
specifies z-value for method of percentiles |
|
specifies method of parameter estimation |
|
specifies tolerance for method of percentiles |
|
specifies second shape parameter for Johnson curve |
|
specifies scale parameter for Johnson curve |
|
specifies lower threshold parameter for Johnson curve |
|
Johnson -Options |
|
specifies first shape parameter for Johnson curve |
|
specifies z-value for method of percentiles |
|
specifies method of parameter estimation |
|
specifies tolerance for method of percentiles |
|
specifies second shape parameter for Johnson curve |
|
specifies the sampling range for parameter starting values in MLE optimization |
|
specifies an iteration limit for MLE optimization |
|
specifies the maximum number of starting points to be used for MLE optimization |
|
prints an iteration history for MLE optimization |
|
specifies a seed value for MLE optimization |
|
specifies the optimality tolerance for MLE optimization |
|
specifies scale parameter for Johnson curve |
|
specifies lower threshold parameter for Johnson curve |
|
Weibull-Options |
|
specifies shape parameter c for Weibull curve |
|
requests table of iteration history and optimizer details |
|
specifies maximum number of iterations in the Newton-Raphson approximation of |
|
specifies scale parameter for Weibull curve |
|
specifies threshold parameter for Weibull curve |
Use the option KERNEL(kernel-options) to compute kernel density estimates. Specify the following secondary options in parentheses after the KERNEL option to control features of density estimates requested with the KERNEL option.
Table 4.7: Kernel-Options
Option |
Description |
---|---|
specifies standardized bandwidth parameter c |
|
specifies color of the kernel density curve |
|
fills area under kernel density curve |
|
specifies type of kernel function |
|
specifies line type used for kernel density curve |
|
specifies lower bound for kernel density curve |
|
specifies upper bound for kernel density curve |
|
specifies line width for kernel density curve |
Table 4.8 summarizes options for enhancing histograms.
Table 4.8: General Graphics Options
Option |
Description |
---|---|
General Graphics Options |
|
produces labels above histogram bars |
|
scales vertical axis without considering fitted curves |
|
lists endpoints for histogram intervals |
|
creates a grid |
|
constructs hanging histogram |
|
specifies reference lines perpendicular to the horizontal axis |
|
specifies labels for HREF= lines |
|
specifies vertical position of labels for HREF= lines |
|
specifies midpoints for histogram intervals |
|
specifies number of histogram interval endpoints |
|
specifies number of histogram interval midpoints |
|
suppresses histogram bars |
|
suppresses label for horizontal axis |
|
suppresses plot |
|
suppresses label for vertical axis |
|
suppresses tick marks and tick mark labels for vertical axis |
|
includes right endpoint in interval |
|
specifies reference lines at values of summary statistics |
|
specifies labels for STATREF= lines |
|
specifies substitution character for displaying statistic values in STATREFLABELS= labels |
|
specifies label for vertical axis |
|
specifies reference lines perpendicular to the vertical axis |
|
specifies labels for VREF= lines |
|
specifies horizontal position of labels for VREF= lines |
|
specifies scale for vertical axis |
|
Options for Traditional Graphics Output |
|
specifies annotate data set |
|
specifies width for the bars |
|
specifies color for axis |
|
specifies color for outlines of histogram bars |
|
specifies color for filling under curve |
|
specifies color for frame |
|
specifies color for grid lines |
|
specifies colors for HREF= lines |
|
draws reference lines behind histogram bars |
|
specifies colors for STATREF= lines |
|
specifies color for text |
|
specifies colors for VREF= lines |
|
specifies description for plot in graphics catalog |
|
specifies software font for text |
|
draws reference lines in front of histogram bars |
|
specifies AXIS statement for horizontal axis |
|
specifies height of text used outside framed areas |
|
specifies number of horizontal minor tick marks |
|
specifies offset for horizontal axis |
|
specifies software font for text inside framed areas |
|
specifies height of text inside framed areas |
|
specifies space between histogram bars |
|
specifies a line type for grid lines |
|
specifies line types for HREF= lines |
|
specifies line types for STATREF= lines |
|
specifies line types for VREF= lines |
|
specifies name for plot in graphics catalog |
|
suppresses frame around plotting area |
|
specifies pattern for filling under curve |
|
turns and vertically strings out characters in labels for vertical axis |
|
specifies AXIS statement or values for vertical axis |
|
specifies number of vertical minor tick marks |
|
specifies length of offset at upper end of vertical axis |
|
specifies line thickness for axes and frame |
|
specifies line thickness for bar outlines |
|
specifies line thickness for grid |
|
Options for ODS Graphics Output |
|
specifies footnote displayed on histogram |
|
specifies secondary footnote displayed on histogram |
|
specifies title displayed on histogram |
|
specifies secondary title displayed on histogram |
|
overlays histograms for different class levels |
|
Options for Comparative Plots |
|
applies annotation requested in ANNOTATE= data set to key cell only |
|
specifies color for filling frame for row labels |
|
specifies color for filling frame for column labels |
|
specifies color for proportion of frequency bar |
|
specifies color for row labels of comparative histograms |
|
specifies color for column labels of comparative histograms |
|
specifies distance between tiles |
|
specifies maximum number of bins to display |
|
limits the number of bins that display to within a specified number of standard deviations above and below mean of data in key cell |
|
specifies number of columns in comparative histogram |
|
specifies number of rows in comparative histogram |
|
Miscellaneous Options |
|
specifies table of contents entry for histogram grouping |
|
creates table of histogram intervals |
|
suppresses table of contents entries for tables produced by HISTOGRAM statement |
|
creates a data set containing information about histogram intervals |
|
creates a data set containing kernel density estimates |
The following entries provide detailed descriptions of options in the HISTOGRAM statement. Options marked with † are applicable only when traditional graphics are produced. See the section Dictionary of Common Options for detailed descriptions of options common to all plot statements.