The UNIVARIATE Procedure

OUTHISTOGRAM= Output Data Set

You can create an OUTHISTOGRAM= data set with the HISTOGRAM statement. This data set contains information about histogram intervals. Because you can specify multiple HISTOGRAM statements with the UNIVARIATE procedure, you can create multiple OUTHISTOGRAM= data sets.

An OUTHISTOGRAM= data set contains a group of observations for each variable in the HISTOGRAM statement. The group contains an observation for each interval of the histogram, beginning with the leftmost interval that contains a value of the variable and ending with the rightmost interval that contains a value of the variable. These intervals do not necessarily coincide with the intervals displayed in the histogram because the histogram might be padded with empty intervals at either end. If you superimpose one or more fitted curves on the histogram, the OUTHISTOGRAM= data set contains multiple groups of observations for each variable (one group for each curve). If you use a BY statement, the OUTHISTOGRAM= data set contains groups of observations for each BY group. ID variables are not saved in an OUTHISTOGRAM= data set.

By default, an OUTHISTOGRAM= data set contains the _MIDPT_ variable, whose values identify histogram intervals by their midpoints. When the ENDPOINTS= or NENDPOINTS option is specified, intervals are identified by endpoint values instead. If the RTINCLUDE option is specified, the _MAXPT_ variable contains upper endpoint values. Otherwise, the _MINPT_ variable contains lower endpoint values. See Example 4.18.

Table 4.37: Variables in the OUTHISTOGRAM= Data Set

Variable

Description

_COUNT_

number of variable values in histogram interval

_CURVE_

name of fitted distribution (if requested in HISTOGRAM statement)

_EXPPCT_

estimated percent of population in histogram interval determined from optional fitted distribution

_MAXPT_

upper endpoint of histogram interval

_MIDPT_

midpoint of histogram interval

_MINPT_

lower endpoint of histogram interval

_OBSPCT_

percent of variable values in histogram interval

_VAR_

variable name