Previous Page | Next Page

The SGPLOT Procedure

HBOX Statement


Creates a horizontal box plot that shows the distribution of your data.
Interaction: The HBOX statement cannot be used with other plot statements in the SGPLOT procedure.
Featured in: Creating a Horizontal Box Plot


Description

Horizontal and vertical box plots display the distribution of data by using a rectangular box and whiskers. Whiskers are lines that indicate a data range outside of the box.

Parts of a Box Plot

[Parts of a Box Plot]

Parts of a Box Plot shows a diagram of a vertical box plot. The bottom and top edges of the box indicate the intra-quartile range (IQR). That is, the range of values between the first and third quartiles (the 25th and 75th percentiles). The marker inside the box indicates the mean value. The line inside the box indicates the median value.

The elements that are outside the box are dependent on your options. By default, the whiskers that extend from each box indicate the range of values that are outside of the intra-quartile range, but are close enough not to be considered outliers (a distance less than or equal to 1.5*IQR). If you specify the EXTREME option, then the whiskers indicate the entire range of values, including outliers.

Any points that are a distance of more than 1.5*IQR from the box are considered to be outliers. By default, these points are indicated by markers. If you specify DATALABEL= option, then the outlier points have data labels. If you also specify the LABELFAR option, then only outliers that are 3*IQR from the box have data labels.


Syntax

HBOX response-variable </ option(s)>;

option(s) can be one or more options from the following categories:


Required Arguments

response-variable

specifies the response variable for the plot. If you do not specify the CATEGORY= option, then one box is created for the response variable.


Options

BOXWIDTH= numeric-value

specifies the width of the box. Specify a value between 0.0 (0% of the available width) and 1.0 (100% of the available width).

Default: 0.4
CATEGORY= category-variable

specifies the category variable for the plot. A box plot is created for each distinct value of the category variable.

DATALABEL <= variable>

adds data labels for the outlier markers. If you specified a variable, then the values for that variable are used for the data labels. If you do not specify a variable, then the values of the response variable are used.

Note:   This option has no effect if the plot does not contain outlier points.  [cautionend]

EXTREME

specifies that the whiskers can extend to the maximum and minimum values for the response variable, and that outliers are not identified. When you do not specify the EXTREME option, the whiskers cannot be longer than 1.5 times the length of the box.

FREQ= numeric-variable

specifies that each observation is repeated n times for computational purposes, where n is the value of the numeric variable. If n is not an integer, then it is truncated to an integer. If n is less than 1 or missing, then it is excluded from the analysis.

Interaction: If your plot is overlaid with other categorization plots, then the first FREQ variable that you specified is used for all of the plots.
LABELFAR

specifies that only the far outliers have data labels. Far outliers are points whose distance from the box is more than three times the length of the box.

Note:   This option has no effect if you do not specify the DATALABEL option, or if there are no far outliers.  [cautionend]

LEGENDLABEL= "text-string"

specifies a label that identifies the box plot in the legend. By default, the label of the response variable is used.

MISSING

processes missing values as a valid category value and creates a box for it.

NAME= "text-string"

specifies a name for the plot. You can use the name to refer to this plot in other statements.

PERCENTILE= 1 | 2 | 3 | 4 | 5

specifies a method for computing the percentiles for the plot.

For descriptions of each method, see Calculating Percentiles in Base SAS Procedures Guide: Statistical Procedures.

Default: 5
SPREAD

relocates outlier points that have identical values to prevent overlapping.

Note:   This option has no effect if your data does not contain two or more outliers with identical values for the response variable.  [cautionend]

TRANSPARENCY= value

specifies the degree of transparency for the plot. Specify a value from 0.0 (completely opaque) to 1.0 (completely transparent).

Default: 0.0
X2AXIS

assigns the response variable to the secondary (top) horizontal axis.

Y2AXIS

assigns the category variable to the secondary (right) vertical axis.

Previous Page | Next Page | Top of Page