The BOXPLOT Procedure

Output Data Sets

OUTBOX= Data Set

The OUTBOX= data set saves group summary statistics and outlier values. The following variables can be saved:

  • the group variable

  • the variable _VAR_, containing the analysis variable name

  • the variable _TYPE_, identifying features of box-and-whiskers plots

  • the variable _VALUE_, containing values of box-and-whiskers plot features

  • the variable _ID_, containing labels for outliers

  • the variable _HTML_, containing URLs associated with plot features

_ID_ is included in the OUTBOX= data set only if the keyword SCHEMATICID or SCHEMATICIDFAR is specified with the BOXSTYLE= option. _HTML_ is present only if one or more of the HTML=, OUTHIGHHTML=, and OUTLOWHTML= options are specified.

Each observation in an OUTBOX= data set records the value of a single feature of one group’s box-and-whiskers plot, such as its mean. The _TYPE_ variable identifies the feature whose value is recorded in _VALUE_. Table 26.8 lists valid _TYPE_ variable values.

Table 26.8: Valid _TYPE_ Values in an OUTBOX= Data Set

_TYPE_

Description

N

group size

MIN

minimum group value

Q1

group first quartile

MEDIAN

group median

MEAN

group mean

Q3

group third quartile

MAX

group maximum value

STDDEV

group standard deviation

LOW

low outlier value

HIGH

high outlier value

LOWHISKR

low whisker value, if different from MIN

HIWHISKR

high whisker value, if different from MAX

FARLOW

low far outlier value

FARHIGH

high far outlier value


Additionally, the following variables, if specified, are included:

  • block variables

  • symbol variable

  • BY variables

  • ID variables

OUTHISTORY= Data Set

The OUTHISTORY= data set saves group summary statistics. The following variables are saved:

  • the group variable

  • group minimum variables named by analysis-variable suffixed with L

  • group first-quartile variables named by analysis-variable suffixed with 1

  • group mean variables named by analysis-variable suffixed with X

  • group median variables named by analysis-variable suffixed with M

  • group third-quartile variables named by analysis-variable suffixed with 3

  • group maximum variables named by analysis-variable suffixed with H

  • group standard deviation variables named by analysis-variable suffixed with S

  • group size variables named by analysis-variable suffixed with N

If an analysis variable name has the maximum length of 32 characters, PROC BOXPLOT forms summary statistic names from its first 16 characters, its last 15 characters, and the appropriate suffix.

Subgroup summary variables are created for each analysis variable specified in the PLOT statement. For example, consider the following statements:

proc boxplot data=Steel;
   plot (Width Diameter)*Lot / outhistory=Summary;
run;

The data set Summary contains variables named Lot, WidthL, Width1, WidthM, WidthX, Width3, WidthH, WidthS, WidthN, DiameterL, Diameter1, DiameterM, DiameterX, Diameter3, DiameterH, DiameterS, and DiameterN.

Additionally, the following variables, if specified, are included:

  • BY variables

  • block variables

  • symbol variable

  • ID variables

Note that an OUTHISTORY= data set does not contain outlier values, and therefore cannot be used, in general, to save a schematic box plot. You can use an OUTBOX= data set to save a schematic box plot summary.