IMSTAT Procedure (Analytics)

HISTOGRAM Statement

The HISTOGRAM statement calculates a histogram table for numeric variables.

Syntax

HISTOGRAM <variable-list> </ options>;

Optional Argument

variable-list

specifies a single variable or a list of numeric variables. Separate each variable name by at least one space. If you do not specify this option, a histogram table is calculated for each numeric variable.

HISTOGRAM Statement Options

BINVALS=list-of-values

specifies an array of NBINS lower bin boundaries as list-of-values. The histogram binning then uses those values strictly and does not alter them so that they are equally spaced (or “nice”). This option is useful to compute a histogram with bins that are the same as those of another histogram so that the values can be compared or overlaid. The bins do not need to be equally spaced.

EQUALFREQ

specifies to create bins such that each bin contains the same fraction of the data.

Alias EQUAL

MAX=number

specifies the upper end of the range to determine the histogram bins. By default, the maximum value is determined from the data (subject to the WHERE clause). The bins of the histogram can extend beyond the extreme values when the "nice-ing" algorithm places bin boundaries on numbers that are convenient to label on axes.

MIN=number

specifies the lower end of the range to determine the histogram bins. By default, the minimum value is determined from the data (subject to the WHERE clause). The bins of the histogram can extend beyond the extreme values when the "nice-ing" algorithm places bin boundaries on numbers that are convenient to label on axes.

NBINS=k

specifies the number of bins to use for calculating the histogram.

NOEMPTYBIN

prevents bins without observations from being displayed. The leading and trailing empty bins are trimmed. Any internal empty bins are combined into the first non-empty bin to the immediate right. The mid-value of the bin into which the empty bins are combined is not adjusted. If the mid-value is not missing, then you can use the asymmetry of a bin as an indicator that it was combined with empty bins.

NONICE

specifies that the "nice-ing" algorithm is suspended. The boundaries of the histogram are based on the actual range of the data (subject to the WHERE clause) or on the MIN= and MAX= values that you specify. The bin boundaries are not guaranteed to fall on "nice" values.

OUTLIERBIN

specifies that outliers are placed in special bins in the two tails. Outliers with values that fall below Q1 – 1.5*IQR are placed in the left-most bin. Outliers with a value that is above Q3 + 1.5*IQR are placed in the right-most bin. IQR is the inter-quartile range, which covers the central 50% of the distribution of the variable. The mid-value reported by the IMSTAT procedure can be used as an indicator whether a bin is an outlier bin. The mid-value is set to 1 for an outlier bin and set to missing otherwise.

Interaction This option is ignored if you specify the EQUALFREQ option.

ROUNDINGDIRECTON=direction

specifies the direction to round numbers when a rounding factor is specified. For example, if you specify ROUNDINGFACTOR=5, a bin boundary of 6.2 is rounded up to 10, down to 5, and nearest to 5.

The following directions are valid in the HISTOGRAM statement:
UP Round up to a multiple of the ROUNDINGFACTOR= value.
DOWN Round down to a multiple of the ROUNDINGFACTOR= value.
NEAREST Round to the nearest multiple of the ROUNDINGFACTOR= value.
Default UP

ROUNDINGFACTOR=value

specifies the factor to use for rounding up internal bin boundaries. The lower bound of the left-most bin and the upper bound of the right-most bin are not rounded. For example, when you work with prices in dollars, specifying ROUNDINGFACTOR=0.01 rounds the bin boundaries to cents. In the event that the specified rounding factor is greater than the bin width and multiple bins round up to the same number, the bins are collapsed into a single bin.

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

TEMPEXPRESS="SAS-expressions"

TEMPEXPRESS=file-reference

specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.

Alias TE=

TEMPNAMES=variable-name

TEMPNAMES=(variable-list)

specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.

Alias TN=

Details

ODS Table Names

The HISTOGRAM statement generates the following ODS table for each analysis variable.
ODS Table Name
Description
Option
Histogram
Histogram data
Default
For information about using the ODS table with SAVE= option, see the Details section of the STORE statement.