Example Program and Statement Details

Example Graph

The following graph was generated by the Example Program:
Example Box Plot Graph Using Parameters

Example Program

proc template;
  define statgraph boxplotparm1;
    begingraph;
      entrytitle "City Mileage for Vehicle Types";
      layout overlay;
       boxplotparm y=value x=x stat=stat /
         datalabel=datalabel spread=true ;     
      endlayout;
    endgraph;
  end;
run;
 
proc sgrender data=boxdata template=boxplotparm1;
run;
The following input data generated the box for Sedan in the graph. See Generalized Macro for BOXPLOTPARM Data to see the code for creating all of the data.
STAT      X          VALUE    DATALABEL
 ...      
N         Sedan        262
MEAN      Sedan    21.0840
MEDIAN    Sedan         20 
Q1        Sedan         18 
Q3        Sedan         24 
STD       Sedan     4.2346
OUTLIER   Sedan         36    Honda
OUTLIER   Sedan         35    Toyota
OUTLIER   Sedan         35    Toyota
OUTLIER   Sedan         38    Volkswagen
MIN       Sedan         12 
MAX       Sedan         33 
 ...

Statement Summary

The BOXPLOTPARM statement requires pre-computed input data. One reason to choose this statement over the BOXPLOT statement is that you can control the computational technique used to define the box plot components: the mean, quartiles, location of fences, outlier definition, and so on. See Generalized Macro for BOXPLOTPARM Data for examples of such computations using PROC SUMMARY and DATA steps.
The BOXPLOTPARM statement displays a single box if given just Y and a STAT argument. It displays multiple boxes if given both Y and X and a STAT argument and X has more than one unique value.
Two basic box plot representations can be drawn with the BOXPLOTPARM statement: a schematic (Tukey) box plot and a skeletal box plot. See the EXTREME= option for details.
The following figure illustrates the box plot elements:
box plot elements
As shown in the figure, the bottom and top edges of the box are located at the 1st quartile (25th percentile) and 3rd quartile (75th percentile) of the sample. Within the box you can display the median (50th percentile) as a line and the mean as a marker (see the DISPLAY= option).
You can also display markers and data labels for outliers. Outliers are observations that are more extreme than the upper and lower fences ( ). Outliers that are beyond upper and lower far fences ( ) are called FAR OUTLIERS and can also be identified and labeled. From a graphical perspective, the location of fences along the axis are known, but there is no line or marker that displays a fence. (See DISPLAY=, LABELFAR=, and DATALABEL= options).
Finally, you can control the range represented by the whiskers. By default, the whiskers are drawn from the upper edge of the box to the MAX value, and from the lower edge of the box to the MIN value (see EXTREME= option).

Input Data Requirements for the BOXPLOTPARM Statement

At a minimum, valid data for the BOXPLOTPARM statement must provide a numeric column (Y=) that contains calculated statistics for an analysis, and a string column (STAT=) that identifies each statistic. The Y column must contain nonmissing values for the Q1 (25th percentile) and Q3 (75th percentile) statistics. If Y values are missing or not supplied for other statistic values, then those statistics are not displayed in the plot, regardless of syntax requests to display them.
For example, a petroleum company uses a turbine to heat water into steam that is pumped into the ground to make oil more viscous and easier to extract. This process occurs 20 times daily, and the amount of power (in kilowatts) used to heat the water to the desired temperature is recorded. The following data show the statistics that are calculated for one day of this process:
PowerOutputs
Statistic
3180.00
MIN
3340.00
Q1
3487.40
MEAN
3490.00
MEDIAN
3610.00
Q3
4050.00
MAX
20.00
N
To plot the data from the preceding table, the following BOXPLOTPARM statement uses the Y= and STAT= arguments to generate a single box plot for the recorded statistics:
BOXPLOTPARM Y=PowerOutputs STAT=Statistic; 
Graph with Single Box Plot
If the data contain statistics for multiple days of the process, a third column in the data must be present to identify the days that the statistics were recorded. For example, the following data show the statistics that are calculated for two days of this process:
Day
PowerOutputs
Statistic
04JUL
3180.00
MIN
04JUL
3340.00
Q1
04JUL
3487.40
MEAN
04JUL
3490.00
MEDIAN
04JUL
3610.00
Q3
04JUL
4050.00
MAX
04JUL
20.00
N
05JUL
3179.00
MIN
05JUL
3333.50
Q1
05JUL
3471.65
MEAN
05JUL
3419.50
MEDIAN
05JUL
3605.00
Q3
05JUL
3849.00
MAX
05JUL
20.00
N
To plot the data from the preceding table, the BOXPLOTPARM statement needs the Y=, STAT=, and X= arguments to generate a separate box plot for each day that the statistics were recorded:
   BOXPLOTPARM Y=PowerOutputs STAT=Statistic X=Day; 
Graph with Two Box Plots
See Generalized Macro for BOXPLOTPARM Data for a more complete example of providing input data for BOXPLOTPARM.

Arguments

Y=numeric-column | expression
specifies the column for the Y values. The Y values must be the statistical values needed for the box plot. At a minimum, there must be nonmissing values for the 25th and 75th percentiles.
X=column | expression
specifies the column for the X values. The X values must qualify or classify the values in the Y column. This optional argument is used to create a plot box for each classifier.
STAT=string-column
specifies the statistic that is represented by the value in the Y column. Valid STAT= values include the following:
Q1
1st quartile (25th percentile). The data must contain a nonmissing value for this quartile.
Q3
3rd quartile (75th percentile). The data must contain a nonmissing value for this quartile.
MAX
maximum data value less than or equal to the upper fence.
MIN
minimum data value greater than or equal to the lower fence.
MEAN
data mean.
MEDIAN
data median.
OUTLIER
an observation outside the lower and upper fences. The fences are located at a distance 1.5 times the Interquartile Range (IQR = Q3 - Q1) above and below the box. The outliers are labeled when the DATALABEL= option is used.
FAROUTLIER
an observation outside the lower and upper far fences. The far fences are located at a distance 3 times the Interquartile Range (IQR = Q3 - Q1) above and below the box. The far outliers are labeled when the DATALABEL= option is used. Specify that LABELFAR=TRUE to label only the far outliers but not the outliers.
N
subgroup sample size. The N value is not shown in the plot but is used to calculate notch locations when the DISPLAY= option displays notches.
STD
data standard deviation.
Requirement: Other STAT values can be omitted or have missing Y values, but if present, must conform to the following rules for the plot to be displayed:
Q1 <= MEDIAN <= Q3
MIN <= MAX
STD >= 0
N > 0

Options

Statement Option
Description
Specifies the width of a box as a ratio of the maximum possible width.
Specifies the shape at the ends of the whiskers.
Specifies that a connect line joins a statistic from box to box.
Specifies the properties of the line connecting multiple boxes.
Specifies the labels of the outliers.
Specifies the color and font attributes of the outlier labels.
Specifies the degree of the transparency of the box outlines, box fill, whiskers, mean, median, caps, connect lines, and outliers, if displayed.
Specifies an amount to offset all boxes from the discrete X ticks.
Specifies the box plot features to display.
Specifies whether the whiskers can extend beyond the fences.
Specifies the appearance of the interior fill area of the boxes.
Specifies whether all outliers or only far outliers are labeled.
Specifies the label to be used in a legend.
Specifies the attributes of the maker that represents the mean values.
Specifies the properties of the line that represents the median values.
Assigns a name to a plot statement for reference in other template statements.
Specifies the orientation of the Y axis and of the boxes.
Specifies the attributes of the outlier makers.
Specifies the line properties of the box outlines.
Specifies that the data columns and plot type for this plot be used for determining default axis features.
Specifies whether outliers with the same value are spread out to avoid overlap.
Specifies the line properties of the whiskers and caps.
Specifies whether data are mapped to the primary X (bottom) axis or the secondary X2 (top) axis.
Specifies whether data are mapped to the primary Y (left) axis or the secondary Y2 (right) axis.
BOXWIDTH=number
specifies the width of a box as a ratio of the maximum possible width.
Default: .4
Range: 0 (narrowest) to 1 (widest)
CAPSHAPE=SERIF | LINE | BRACKET
specifies the shape at the ends of the whiskers.
Default: The GraphBox:CapStyle style reference.
SERIF
specifies a short line perpendicular to the whisker.
LINE
specifies a line perpendicular to whisker extending the width of the box.
BRACKET
specifies a line perpendicular to the whisker extending the width of the box with short extensions at the ends drawn in the direction of the box.
Interaction: The cap color and the thickness are specified by the WHISKERATTRS= option. The cap pattern is always solid.
Interaction: The DISPLAY= option must include CAPS for cap lines to be shown.
CONNECT= MEAN | MEDIAN | Q1 | Q3 | MIN | MAX
specifies that a connect line joins a statistic from box to box.
Default: The GraphBox:Connect style reference.
Requirement: The DISPLAY= option must contain the CONNECT suboption for the connect line to be displayed.
Interaction: This option only applies when the X= argument is used to generate multiple boxes.
CONNECTATTRS=style-element | style-element (line-options) | (line-options)
specifies the attributes of the lines connecting multiple boxes. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphConnectLine style element.
Interaction: If there is only one box, this option is ignored.
DATALABEL=column
specifies the labels of the values that are identified as outlier or faroutlier by the STAT= column. Either a numeric or a character column can be used.
Default: no default
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
See also: LABELFAR= option
DATALABELATTRS=style-element | style-element (text-options) | (text-options)
specifies the color and font attributes of the outlier labels. See General Syntax for Attribute Options for the syntax on using a style-element and Text Options for available text-options.
Default: The GraphDataText style element.
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
Interaction: If one or more specified label options does not include all the font properties (color, family, size, weight, style), non-specified properties are derived from the GraphDataText style element.
DATATRANSPARENCY=number
specifies the degree of the transparency of the box outlines, box fill, whiskers, mean, median, caps, connect lines, and outliers, if displayed.
Default: 0
Range: 0 (opaque) to 1 (entirely transparent).
DISCRETEOFFSET=number
specifies an amount to offset all boxes from the discrete X ticks.
Note: This feature is for the third maintenance release of SAS 9.2 and later.
Default: 0 (no offset, all boxes are centered on the discrete ticks)
Range: -0.5 to +0.5, where .5 represents half the distance between discrete ticks. A positive offset is to the right when ORIENT=VERTICAL, and up when ORIENT=HORIZONTAL. (If the layout's axis options set REVERSE=TRUE, then the offset direction is also reversed.)
Details: This feature is useful for graphing multiple response variables side by side on a common axis. By default within an overlay-type layout, if multiple BOXPLOTPARM statements are used with different analysis variables, the boxes for matching X values are centered on the ticks. Depending on the data, the boxes might be superimposed. The following code fragment shows the default box positioning:
layout overlay / cycleattrs=true 
    yaxisopts=(label="Miles Per Gallon");

  boxplotparm x=type y=mpg_city stat=y_stat    / name="City" ;
  boxplotparm x=type y=mpg_highway stat=y_stat / name="Highway" ;

  discretelegend "City" "Highway";
endlayout;
Box Plot with Boxes Centered on Discrete X Ticks
To place the different response values side by side, you can assign a different offset to each BOXPLOTPARM statement. The BOXWIDTH= option can be used in conjunction with the DISCRETEOFFSET= option to create narrower boxes when desired.
layout overlay / cycleattrs=true 
    yaxisopts=(label="Miles Per Gallon");

  boxplotparm x=type y=mpg_city stat=y_stat    / name="City"
    discreteoffset=.2 ;
  boxplotparm x=type y=mpg_highway stat=y_stat / name="Highway"
    discreteoffset=-.2 ;

  discretelegend "City" "Highway";
endlayout;
Box Plot with Tick Offsets
DISPLAY=STANDARD | ALL | ( display-options )
specifies which additional features of the box plot to display.
Default: The GraphBox:DisplayOpts style reference. If this style element does not exist, the default is STANDARD.
STANDARD
displays this combination of features (CAPS FILL MEAN MEDIAN OUTLIERS)
ALL
displays all features
display-options
a list of features to be displayed. The list must be enclosed in parentheses and can include any of the following:
CAPS displays caps at the ends of the whiskers
CONNECT displays the line connecting multiple boxes
FILL displays filled boxes
MEAN displays the mean symbol within the box
MEDIAN displays the median line within the box
NOTCHES displays notched boxes
OUTLIERS displays markers for the outliers
Restriction: The display features requested can be displayed only if the input data includes this information.
Interaction: If EXTREME=TRUE, then the OUTLIERS feature is ignored
To control the appearance of these features, use the CONNECTATTRS=, FILLATTRS=, MEANATTRS=, MEDIANATTRS=, OUTLIERATTRS=, and WHISKERATTRS= options. The WHISKERATTRS= option controls affects both CAPS and WHISKERS.
Details: The endpoints of the notches are at the following computed locations.
In the equation, the IQR is the interquartile range and N is the sample size. Endpoints of the Notches
EXTREME=boolean
specifies whether the whiskers can extend beyond the fences.
Default: FALSE
FALSE
specifies that whiskers be drawn from the upper edge of the box to the largest value within the upper fence, and from the lower edge of the box to the smallest value within the lower fence. This representation is sometime called a schematic box and whisker plot or Tukey box and whisker plot.
TRUE
specifies that whiskers be drawn to the largest and smallest data values, whether these values are inside or outside the fences. The outliers and far outliers are not displayed and are not labeled. This representation is sometime called a skeletal box and whisker plot.
Interaction: This option overrides the DATALABEL=, DATALABELATTRS=, LABELFAR=, OUTLIERATTRS=, and SPREAD= options.
Fences are locations above and below the box. The upper and lower fences are located at a distance 1.5 times the Interquartile Range (IQR) ( IQR = Q3 - Q1 ). The upper and lower far fences are located at a distance 3 times the IQR (see Example Program and Statement Details).
FILLATTRS=style-element | style-element (fill-options) | (fill-options)
specifies the appearance of the interior fill area of the boxes. See General Syntax for Attribute Options for the syntax on using a style-element and Fill Options for available fill-options.
Default: The GraphDataDefault style element.
Interaction: For this option to have any effect, the fill must be enabled by the ODS style or the DISPLAY= option.
LABELFAR=boolean
specifies whether all outliers or only far outliers are labeled. For more information about outliers, see the Example Program and Statement Details.
Default: FALSE
FALSE
the labels specified by the DATALABEL= option apply to both outliers and far outliers.
TRUE
the labels specified by the DATALABEL= option only apply to far outliers.
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
LEGENDLABEL= "string"
specifies a label for use in a legend.
Default: The string specified on the NAME= option.
MEANATTRS=style-element | style-element (marker-options) | (marker-options)
specifies the attributes of the marker representing the mean within the box. See General Syntax for Attribute Options for the syntax on using a style-element and Marker Options for available marker-options.
Default: The GraphBoxMean style element.
Interaction: This option is ignored if the DISPLAY= option does not display the mean.
MEDIANATTRS=style-element | style-element (line-options) | (line-options)
specifies the appearance of the line representing the median within the box. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphBoxMedian style element.
Interaction: This option is ignored if the DISPLAY= option does not display the median.
NAME="string"
assigns a name to a plot statement for reference in other template statements.
Default: no default
Restriction: The string is case sensitive, cannot contain spaces, and must define a unique name within the template.
Interaction: The string is used as the default legend label if the LEGENDLABEL= option is not used.
The specified name is used primarily in legend statements to coordinate the use of colors and line patterns between the graph and the legend.
ORIENT= VERTICAL | HORIZONTAL
specifies the orientation of the Y axis and of the boxes.
Default: VERTICAL
OUTLIERATTRS=style-element | style-element (marker-options) | (marker-options)
specifies the attributes of the markers representing the outliers. See General Syntax for Attribute Options for the syntax on using a style-element and Marker Options for available marker-options.
Default: The GraphOutlier style element.
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
OUTLINEATTRS=style-element | style-element (line-options) | (line-options)
specifies the appearance of the box outline. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphOutlines style element.
PRIMARY=boolean
specifies that the data columns for this plot and the plot type be used for determining default axis features.
Default: FALSE
Restriction: This option is ignored if the plot is placed under a GRIDDED or LATTICE layout block.
Details: This option is needed only when two or more plots within an overlay-type layout contribute to a common axis. For more information, see When Plots Share Data and a Common Axis
SPREAD=boolean
specifies whether outliers with the same value are spread out to avoid overlap. For vertical box plots this means offsetting the outliers horizontally. If this option is false, outliers with the same value are plotted in the same position. Thus, only one is visible
Default: FALSE
Interaction: This option is ignored if EXTREME=TRUE or the DISPLAY= option does not display the outliers.
WHISKERATTRS=style-element | style-element (line-options) | (line-options)
specifies the line properties of the whiskers and caps. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphBoxWhisker style element.
XAXIS=X | X2
specifies whether data are mapped to the primary X (bottom) axis or to the secondary X2 (top) axis.
Default: X
Interaction: This option is ignored if the X= argument is not specified.
Interaction: The overall plot specification and the layout type determine the axis display for the specified axis. For more information, see How Axis Features are Determined.
YAXIS=Y | Y2
specifies whether data are mapped to the primary Y (left) axis or to the secondary Y2 (right) axis.
Default: Y
Interaction: The overall plot specification and the layout type determine the axis display for the specified axis. For more information, see How Axis Features are Determined.