Example Program and Statement Details

Example Graph

The following graph was generated by the Example Program:
Example Box Plot Graph

Example Program

proc template;
  define statgraph boxplot;
    begingraph;
      entrytitle "City Mileage for Vehicle Types";
      layout overlay;
        boxplot y=mpg_city x=type /
          datalabel=make spread=true;
      endlayout;
    endgraph;
  end;
run;

proc sgrender data=sashelp.cars template=boxplot;
  label type="Vehicle Type";
run;

Statement Summary

The BOXPLOT statement displays a single box if given just a Y argument. It displays multiple boxes if given both Y and X arguments and X has more than one unique value. For numeric or character columns, the X-axis is always of TYPE=DISCRETE.
Two basic box plot representations can be drawn with the BOXPLOT statement: a schematic (Tukey) box plot and a skeletal box plot. See the EXTREME= option for details.
The following figure illustrates the box plot elements:
Box Plot Elements
As shown in the figure, the bottom and top edges of the box are located at the 25th and 75th percentiles of the sample. Within the box you can display the median (50th percentile) as a line and the mean as a marker (see DISPLAY= option).
You can also display markers and data labels for outliers. Outliers are observations that are more extreme than the upper and lower fences ( ). Outliers that are beyond upper and lower far fences ( ) are called FAR OUTLIERS and can also be identified and labeled. From a graphical perspective, the location of fences along the axis are known, but there is no line or marker that displays a fence. (See DISPLAY=, LABELFAR=, and DATALABEL= options).
Finally, you can control the range represented by the whiskers. By default, the whiskers are drawn from the upper edge of the box to the MAX value, and from the lower edge of the box to the MIN value (see EXTREME= option).

Arguments

Y=numeric-column | expression
specifies the column for the Y values. This argument is required.
X=column | expression
specifies the column for the X values. This column is used to create a box plot for each unique X value and is optional.

Options

Statement Option
Description
Specifies the width of a box as a ratio of the maximum possible width.
Specifies the shape at the ends of the whiskers.
Specifies that a connect line joins a statistic from box to box.
Specifies the properties of the line connecting multiple boxes.
Specifies the labels of the outliers.
Specifies the color and font attributes of the outlier labels.
Specifies the degree of the transparency of the box outlines, box fill, whiskers, mean, median, caps, connect lines, and outliers, if displayed.
Specifies an amount to offset all boxes from the discrete X ticks.
Specifies the box plot features to display.
Specifies whether the whiskers can extend beyond the fences.
Specifies the appearance of the interior fill area of the boxes.
Specifies a numeric column that provides frequencies for each observation read.
Specifies whether all outliers or only far outliers are labeled.
Specifies the label to be used in a legend.
Specifies the attributes of the maker that represents the mean values.
Specifies the properties of the line that represents the median values.
Assigns a name to a plot statement for reference in other template statements.
Specifies the orientation of the Y axis and of the boxes.
Specifies the attributes of the outlier makers.
Specifies the properties of the box outlines.
Specifies one of five definitions used to calculate percentiles.
Specifies that the data columns and plot type for this plot be used for determining default axis features.
Specifies whether outliers with the same value are spread out to avoid overlap.
Specifies the line properties of the whiskers and caps.
Specifies whether data are mapped to the primary X (bottom) axis or the secondary X2 (top) axis.
Specifies whether data are mapped to the primary Y (left) axis or the secondary Y2 (right) axis.
BOXWIDTH=number
specifies the width of a box as a ratio of the maximum possible width.
Default: .4
Range: 0 (narrowest) to 1 (widest)
CAPSHAPE=SERIF | LINE | BRACKET
specifies the shape at the ends of the whiskers.
Default: The GraphBox:CapStyle style reference.
SERIF
specifies a short line perpendicular to the whisker.
LINE
specifies a line perpendicular to whisker extending the width of the box.
BRACKET
specifies a line perpendicular to the whisker extending the width of the box with short extensions at the ends drawn in the direction of the box.
Interaction: The cap color and the thickness are specified by the WHISKERATTRS= option. The cap pattern is always solid.
Interaction: The DISPLAY= option must include CAPS for cap lines to be shown.
CONNECT= MEAN | MEDIAN | Q1 | Q3 | MIN | MAX
specifies that a connect line joins a statistic from box to box.
Default: The GraphBox:Connect style reference.
Requirement: The DISPLAY= option must contain the CONNECT suboption for the connect line to be displayed.
Interaction: This option only applies when the X= argument is used to generate multiple boxes.
CONNECTATTRS=style-element | style-element (line-options) | (line-options)
specifies the attributes of the lines connecting multiple boxes. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphConnectLine style element.
Interaction: If there is only one box, this option is ignored.
DATALABEL=column
specifies the labels of the outliers. Either a numeric or a character column can be used.
Default: no default
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
See also: LABELFAR= option
DATALABELATTRS=style-element | style-element (text-options) | (text-options)
specifies the color and font attributes of the outlier labels. See General Syntax for Attribute Options for the syntax on using a style-element and Text Options for available text-options.
Default: The GraphDataText style element.
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
Interaction: If one or more label options are specified and they do not include all the font properties (color, family, size, weight, style), non-specified properties are derived from the GraphDataText style element.
DATATRANSPARENCY=number
specifies the degree of the transparency of the box outlines, box fill, whiskers, mean, median, caps, connect lines, and outliers, if displayed.
Default: 0
Range: 0 (opaque) to 1 (entirely transparent).
DISCRETEOFFSET=number
specifies an amount to offset all boxes from the discrete X ticks.
Note: This feature is for the third maintenance release of SAS 9.2 and later.
Default: 0 (no offset, all boxes are centered on the discrete ticks)
Range: -0.5 to +0.5, where .5 represents half the distance between discrete ticks. A positive offset is to the right when ORIENT=VERTICAL, and up when ORIENT=HORIZONTAL. (If the layout's axis options set REVERSE=TRUE, then the offset direction is also reversed.)
Details: This feature is useful for graphing multiple response variables side by side on a common axis. By default within an overlay-type layout, if multiple BOXPLOT statements are used with different analysis variables, the boxes for matching X values are centered on the ticks. Depending on the data, the boxes might be superimposed. The following code fragment shows the default box positioning:
layout overlay / cycleattrs=true 
    yaxisopts=(label="Miles Per Gallon");

  boxplot x=type y=mpg_city    / name="City";
  boxplot x=type y=mpg_highway / name="Highway";

  discretelegend "City" "Highway";
endlayout;
Box Plot with Boxes Centered on Discrete X Ticks
To place the different response values side by side, you can assign a different offset to each BOXPLOT statement. The BOXWIDTH= option can be used in conjunction with the DISCRETEOFFSET= option to create narrower boxes when desired.
layout overlay / cycleattrs=true 
    yaxisopts=(label="Miles Per Gallon");

  boxplot x=type y=mpg_city    / name="City"
    discreteoffset=.2 ;
  boxplot x=type y=mpg_highway / name="Highway"
    discreteoffset=-.2 ;

  discretelegend "City" "Highway";
endlayout;
Box Plot with Tick Offsets
DISPLAY=STANDARD | ALL | ( display-options )
specifies which additional features of the box plot to display.
Default: The GraphBox:DisplayOpts style reference. If this style element does not exist, the default is STANDARD.
STANDARD
displays this combination of features (CAPS FILL MEAN MEDIAN OUTLIERS)
ALL
displays all features
(display-options)
a list of features, enclosed in parentheses, to be displayed. The list can include any of the following:
CAPS displays caps at the ends of the whiskers
CONNECT displays the line connecting multiple boxes
FILL displays filled boxes
MEAN displays the mean symbol within the box
MEDIAN displays the median line within the box
NOTCHES displays notched boxes
OUTLIERS displays markers for the outliers
Interaction: If EXTREME=TRUE, then the OUTLIERS feature is ignored
To control the appearance of these features, use the CONNECTATTRS=, FILLATTRS=, MEANATTRS=, MEDIANATTRS=, OUTLIERATTRS=, and WHISKERATTRS= options. The WHISKERATTRS= option controls affects both CAPS and WHISKERS.
Details: The endpoints of the notches are at the following computed locations.
In the equation, the IQR (IQR=Q3-Q1) is the interquartile range and N is the sample size.
Endpoints of the Notches
EXTREME=boolean
specifies whether the whiskers can extend beyond the fences.
Default: FALSE
FALSE
specifies that whiskers be drawn from the upper edge of the box to the largest value within the upper fence, and from the lower edge of the box to the smallest value within the lower fence. This representation is sometime called a schematic box and whisker plot or Tukey box and whisker plot.
TRUE
specifies that whiskers be drawn to the largest and smallest data values, whether these values are inside or outside the fences. The outliers and far outliers are not displayed and are not labeled. This representation is sometime called a skeletal box and whisker plot.
Interaction: This option overrides the DATALABEL=, DATALABELATTRS=, LABELFAR=, OUTLIERATTRS=, and SPREAD= options.
Fences are locations above and below the box. The upper and lower fences are located at a distance 1.5 times the Interquartile Range (IQR) ( IQR = Q3 - Q1 ). The upper and lower far fences are located at a distance 3 times the IQR (see Example Program and Statement Details).
FILLATTRS=style-element | style-element (fill-options) | (fill-options)
specifies the appearance of the interior fill area of the boxes. See General Syntax for Attribute Options for the syntax on using a style-element and Fill Options for available fill-options.
Default: The GraphDataDefault style element.
Interaction: For this option to have any effect, the fill must be enabled by the ODS style or the DISPLAY= option.
FREQ=numeric-column | expression
specifies a numeric column that provides frequencies for each observation read. If n is the value of the numeric-column for a given observation, then that observation is used n times for any statistical computation.
Default: Each observation is counted once.
Restriction: If the value of the numeric-column is missing or is less than 1, the observation is not used in the analysis. If the value is not an integer, only the integer portion is used.
LABELFAR=boolean
specifies whether all outliers or only far outliers are labeled. For more information about outliers, see the Example Program and Statement Details.
Default: FALSE
FALSE
the labels specified by the DATALABEL= option apply to both outliers and far outliers.
TRUE
the labels specified by the DATALABEL= option only apply to far outliers.
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
LEGENDLABEL= "string"
specifies a label for use in a legend.
Default: The string specified on the NAME= option.
MEANATTRS=style-element | style-element (marker-options) | (marker-options)
specifies the attributes of the marker representing the mean within the box. See General Syntax for Attribute Options for the syntax on using a style-element and Marker Options for available marker-options.
Default: The GraphBoxMean style element.
Interaction: This option is ignored if the DISPLAY= option does not display the mean.
MEDIANATTRS=style-element | style-element (line-options) | (line-options)
specifies the appearance of the line representing the median within the box. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphBoxMedian style element.
Interaction: This option is ignored if the DISPLAY= option does not display the median.
NAME="string"
assigns a name to a plot statement for reference in other template statements.
Default: no default
Restriction: The string is case sensitive, cannot contain spaces, and must define a unique name within the template.
Interaction: The string is used as the default legend label if the LEGENDLABEL= option is not used.
The specified name is used primarily in legend statements to coordinate the use of colors and line patterns between the graph and the legend.
ORIENT= VERTICAL | HORIZONTAL
specifies the orientation of the Y axis and of the boxes.
Default: VERTICAL
OUTLIERATTRS=style-element | style-element (marker-options) | (marker-options)
specifies the attributes of the markers representing the outliers. See General Syntax for Attribute Options for the syntax on using a style-element and Marker Options for available marker-options.
Default: The GraphOutlier style element.
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
OUTLINEATTRS=style-element | style-element (line-options) | (line-options)
specifies the appearance of the box outline. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphOutlines style element.
PERCENTILE= 1 | 2 | 3 | 4 | 5
specifies one of five definitions used to calculate percentiles.
Default: 5 (empirical distribution function with averaging)
The percentile definitions and default are the same as used by PCTLDEF= option of PROC UNIVARIATE or the QNTLDEF= option of PROC SUMMARY.
Calculating Percentiles: You can specify one of five definitions for computing the percentiles with the PERCENTILE= option. Let be the number of nonmissing values for a variable, and let , , ..., represent the ordered values of the variable. is the smallest value, is the next smallest, and is the largest value. Let the th percentile be , set , and let
when PERCENTILE=1, 2, 3, or 5
when PERCENTILE=4
where is the integer part of , and is the fractional part of . Then the PERCENTILE= option defines the th percentile, , as described in the following table:
Percentile Definitions
PRIMARY=boolean
specifies that the data columns for this plot and the plot type be used for determining default axis features.
Default: FALSE
Restriction: This option is ignored if the plot is placed under a GRIDDED or LATTICE layout block.
Details: This option is needed only when two or more plots within an overlay-type layout contribute to a common axis. For more information, see When Plots Share Data and a Common Axis
SPREAD=boolean
specifies whether outliers with the same value are spread out to avoid overlap. For vertical box plots this means offsetting the outliers horizontally. If this option is false, outliers with the same value are plotted in the same position. Thus, only one is visible
Default: FALSE
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
WHISKERATTRS=style-element | style-element (line-options) | (line-options)
specifies the line properties of the whiskers and caps. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphBoxWhisker style element.
XAXIS=X | X2
specifies whether data are mapped to the primary X (left) axis or to the secondary X2 (right) axis.
Default: X
Interaction: This option is ignored if the X= argument is not specified.
Interaction: The overall plot specification and the layout type determine the axis display. For more information, see How Axis Features are Determined.
YAXIS=Y | Y2
specifies whether data are mapped to the primary Y (bottom) axis or to the secondary Y2 (top) axis.
Default: Y
Interaction: The overall plot specification and the layout type determine the axis display. For more information, see How Axis Features are Determined.