The PARETO Procedure

Dictionary of HBAR and VBAR Statement Options

This section provides detailed descriptions of options you can specify after the slash (/) in the HBAR and VBAR statements. For example, to request that the frequency axis of a vertical Pareto chart be scaled by counts, use the SCALE= option as follows:

proc pareto data=failure;
   vbar cause / scale = count;
run;

This section consists of the following subsections:

  • The section General Options contains descriptions of general Pareto chart options.

  • The section Options for Traditional Graphics describes options that apply only when traditional graphics output is produced, as when ODS Graphics is disabled.

  • The section Options for Legacy Line Printer Charts contains descriptions of options that apply only to legacy line printer charts, which are produced by VBAR statements when you specify the LINEPRINTER option in the PROC PARETO statement.

Note: The terminology used in the option descriptions describes vertical Pareto charts. For example, the "tallest" bar is the one that extends farthest along the frequency axis, whether it is oriented vertically or horizontally.

General Options

You can specify the following general options:

ANCHOR=keyword

specifies where the Pareto curve is anchored to the first bar on the chart. Table 15.7 describes the position keywords available in the HBAR and VBAR statements.

Table 15.7: ANCHOR= Option Keywords

HBAR Keyword

Anchoring Position

BR

Bottom right corner (default)

LC

Left center

RC

Right center

TL

Top left corner

VBAR Keyword

Anchoring Position

BC

Bottom center

BL

Bottom left corner

TC

Top center

TR

Top right corner (default)


See Output 15.2.1 for an illustration.

AXISFACTOR=value

specifies a factor used in scaling the frequency axis. This factor determines (approximately) the ratio of the length of the axis to the length of the tallest bar, and it is used to provide space for the cumulative percentage curve. The value must be greater than or equal to 1.

By default, the factor is chosen so that the curve is anchored at the top right corner of the first bar (see also the ANCHOR= option). However, if anchoring to the top of the first bar causes the bars to be flattened excessively, a smaller default factor is used.

This option is not applicable if the cumulative percentage curve is suppressed by the NOCURVE option.

BARLABEL=CMPCT | COUNT | VALUE | (variable-list)

requests that a label be displayed for each bar. You can specify the following values:

CMPCT

specifies that the label indicates the cumulative percentage that is associated with that bar. An alternative to BARLABEL=CMPCT is the CMPCTLABEL option, which labels points on the cumulative percentage curve with their values.

COUNT

specifies that the label displays the count for the bar, regardless of the SCALE= option setting.

VALUE

specifies that the label indicates the height of the bar in the units used by the frequency axis. The units are determined by the SCALE= option setting. See Example 15.8 for an illustration.

(variable-list)

specifies that the label displays the values of one or more variables from the input data set. If a format is associated with a variable, then the formatted value is displayed. Values can be up to 32 characters long. The variable values must be consistent within observations that correspond to a particular Pareto category. The variables are saved in the OUT= data set. If you specify more than one process variable in the chart statement, you can specify more than one variable in variable-list. The BARLABEL= and process variables are matched by their positions in their respective variable lists.

The space in horizontal Pareto charts might be insufficient to display long bar labels. You can specify the AXISFACTOR= option to increase the available space beyond the bars. If you are producing traditional graphics, you can use the BARLABPOS= option to specify how labels are positioned relative to the bars.

BARLEGEND=(variable-list)

requests that a legend be added to the chart to explain colors for bars that are specified in the BARS= or CBARS= option, or patterns for bars that are specified in the PBARS= option. The variable-list must be enclosed in parentheses even if only one variable is specified. See Output 15.4.1 for an illustration.

The values of the variables in variable-list provide the explanatory labels used in the legend. If a format is associated with the variable, then the formatted value is displayed. Values can be up to 32 characters long.

This option is not applicable unless you specify one or more of the BARS=, CBARS=, or PBARS= options. In the DATA= data set, the values of the BARLEGEND= variable must be identical in observations for which the value of the BARS=, CBARS=, or PBARS= variable (or the combination of the CBARS= and PBARS= values) is the same. This ensures that the legend derived from the BARLEGEND= variable is consistent.

If you specify more than one process variable in a chart statement and a corresponding list of BARS=, CBARS=, or PBARS= variables, you can specify a list of BARLEGEND= variables. The number of variables in variable-list should be less than or equal to the number of process variables. The lists of variables are matched so that the first variable in variable-list is applied to the first process variable and the first BARS=, CBARS=, or PBARS= variable; the second variable in variable-list is applied to the second process variable and the second BARS=, CBARS=, or PBARS= variable; and so forth. If the process variable list is longer than variable-list, the charts for the extra process variables do not display a bar legend.

BARLEGLABEL='label'

specifies the label to be displayed to the left of the legend that is created by the BARLEGEND= option. See Output 15.4.1 for an illustration.

The BARLEGLABEL= option is applicable only in conjunction with BARS= , CBARS= , or PBARS= variables. The label can be up to 16 characters and must be enclosed in quotation marks.

If you do not specify a label, the BARLEGEND= variable label is displayed (unless the label is longer than 16 characters, in which case the variable name is displayed). If you do not specify the BARLEGLABEL= option and no label is associated with the BARLEGEND= variable, no legend label is displayed.

BARS=(variable-list)

uses different colors to group bars of the Pareto chart for display Bars that correspond to the same value of a variable in variable-list are assigned the same color from the ODS style. You cannot specify the BARS= option in conjunction with the CHIGH(n) or CLOW(n) options.

If you specify more than one process variable, you can specify more than one variable in variable-list. The number of variables in variable-list should be less than or equal to the number of process variables. The two lists of variables are paired in order of their specification. If a BARS= variable is not provided for a process variable, the bars for that chart are filled with the default color from the ODS style.

CATLEGEND=AUTO | OFF | ON

specifies whether a category legend is created for ODS Graphics output. You can specify the following values:

AUTO

creates a category legend only when the labels would be too crowded on the category axis.

OFF

suppresses the category legend.

ON

creates a category legend.

By default, CATLEGEND=AUTO. This option is ignored if ODS Graphics is not enabled.

CATLEGLABEL='label'

specifies a label for the category legend. A category legend is created when there is insufficient space to label the categories along the category axis or when requested in the CATLEGEND= option. The label can be up to 16 characters and must be enclosed in quotation marks. The default label is "Categories:". See Example 15.3 for an illustration. This option is ignored when no category legend is produced.

CATOFFSET=value

specifies the length of the offset at both ends of the category axis (in screen percentage units). You can eliminate the offset by specifying CATOFFSET=0.

CATREF='value-list'

specifies where reference lines perpendicular to the Pareto category axis are to appear on the chart. Character values can be up to 64 characters and must be enclosed in quotation marks. The values must be values of the process variable regardless of whether the bars are numbered and a category legend is introduced.

CATREFLABELS='label1'$\ldots $'labeln'

specifies labels for the lines that are requested in the CATREF= option. The number of labels must equal the number of lines requested. Labels can be up to 16 characters and must be enclosed in quotation marks.

CFRAMENLEG
CFRAMENLEG=EMPTY
CFRAMENLEG=color

displays a frame around the sample size legend that is requested in the NLEGEND option. You can specify this option in the following ways:

(no argument)

fills the frame with the background color that is specified by the Color attribute of the GraphBackground style element in the current ODS style.

EMPTY

produces a frame that has a transparent background.

color

produces a frame whose background is color when you are producing traditional graphics.

CHARTTYPE=CUMULATIVE | INTERVALS<(interval-options)> | STANDARD

specifies the type of Pareto chart to be produced. This option is supported only for ODS Graphics output. You can specify the following options:

CUMULATIVE

creates a cumulative Pareto bar chart.

INTERVALS<(interval-options)>

creates a Pareto dot plot that includes acceptance intervals, which are computed using simulation. You can specify the following interval-options for computing acceptance intervals:

ALPHA=value

specifies the significance level for the acceptance intervals. By default, ALPHA=0.05.

NSAMPLES=n

specifies the number of random samples used in the simulation. By default, NSAMPLES=2000.

SEED=n

specifies the seed value for the random number generator that is used in the simulation. By default, or when you specify n$\leq 0$, a seed value is generated by using the system clock.

STANDARD

creates a traditional Pareto chart.

By default, CHARTTYPE=STANDARD.

Wilkinson (2006) describes the advantages of the cumulative Pareto bar chart and the Pareto dot plot that includes acceptance intervals. See Example 15.9 for examples of these alternative Pareto charts.

CHIGH(n)
CHIGH(n)=color

highlights the bars that have the n highest frequencies by filling them with a contrasting color from the ODS style. When producing traditional graphics output, you can specify CHIGH(n)=color to select a specific color. You cannot use the CHIGH(n) option in conjunction with a BARS= or CBARS= variable, but you can use it together with the CLOW(n) and CBARS=color options. See Output 15.3.1 for an illustration.

CLASS=variable
CLASS=(variable1 variable2)

creates a comparative Pareto chart by using the levels of the variables. If you specify two variables, then you must enclose in parentheses. See Example 15.1 and Example 15.2.

If you specify a single variable, the observations in the input data set are classified by the formatted values (levels) of the variable. A Pareto chart is created for the process variable values in each level, and these component charts (referred to as cells) are arranged in an array. The cells are labeled with the class levels, and uniform horizontal and vertical axes are used to facilitate comparisons.

If you specify two variables, the observations in the input data set are cross-classified by the values (levels) of the variables. A Pareto chart is created for the process variable values in each cell of the cross-classification, and these charts are arranged in a matrix. The levels of the first variable label the rows, and the levels of the second variable label the columns. Uniform horizontal and vertical axes are used to facilitate comparisons.

The variables can be numeric or character. The maximum length of a character variable is 32. If a format is associated with a variable, the formatted values determine the levels. Only the first 32 characters of the formatted values are used to determine the levels. You can specify whether missing values are treated as a level by using the MISSING1 and MISSING2 options.

In traditional graphics output, only the level values are displayed in row and column headers. If a label is associated with a variable, the label is displayed in a second header that spans the row or column headers.

CLASSKEY='value'
CLASSKEY=('value1' 'value2')

specifies the key cell in a comparative Pareto chart, which is created when you specify the CLASS= option. The key cell is defined as the cell in which the Pareto bars are arranged in decreasing order. This order then determines the uniform category axis used for all the cells.

If you specify CLASS=variable, you can specify CLASSKEY='value' to identify the key cell as the level for which the variable is equal to value. The value can have up to 32 characters, and you must specify a formatted value. By default, the levels are sorted as specified by the ORDER1= option, and the key cell is the level that occurs first in this order. The cells are displayed in this order from top to bottom (or left to right, depending on the NCOLS= and NROWS= values), and consequently the key cell is displayed at the top or at the left. The cell you specify in the CLASSKEY= option is displayed at the top or at the left unless you also specify the NOKEYMOVE option.

If you specify CLASS=(variable1 variable2), you can specify CLASSKEY=('value1' 'value2') to identify the key cell as the level for which variable1 is equal to value1 and variable2 is equal to value2. Here, value1 and value2 must be formatted values, and they must be enclosed in quotation marks. By default, the levels of variable1 are sorted in the order determined by the ORDER1= option, and then within each of these levels, the levels of variable2 are sorted in the order determined by the ORDER2= option. The default key cell is the combination of levels of variable1 and variable2 that occurs first in this order. The cells are displayed in order of variable1 from top to bottom and in order of variable2 from left to right. Consequently, the default key cell is displayed in the upper left corner. The cell you specify in the CLASSKEY= option is displayed in the upper left corner unless you also specify the NOKEYMOVE option.

For an example of the use of the CLASSKEY= option, see Output 15.1.3.

CLOW(n)
CLOW(n)=color

highlights the bars that have the n lowest frequencies by filling them with a contrasting color from the ODS style. When producing traditional graphics output, you can specify CLOW(n)=color to select a specific color. You cannot use the CLOW(n)= option in conjunction with a CBARS= variable, but you can use it together with the CBARS=color and CHIGH(n) options.

CMPCTLABEL

labels points on the cumulative percentage curve with their values. By default, the points are not labeled.

CPROP
CPROP=EMPTY
CPROP=color

requests that a proportion-of-frequency bar of the specified color be displayed horizontally across the top of each tile in a comparative Pareto chart. You can specify the following values:

(no argument)

creates bars that are filled with a color from the ODS style.

EMPTY

produces empty bars in traditional graphics output.

color

produces bars that are filled with color in traditional graphics output.

The length of the bar relative to the width of the tile indicates the proportion of the total frequency count in the chart that is represented by the tile. You can use the bars to visualize the distribution of frequency count by tile. See Output 15.1.4 for an illustration.

The CPROP= option provides a graphical alternative to the NLEGEND option, which displays the actual count. The CPROP= option is applicable only with comparative Pareto charts.

CUMAXIS=value-list

specifies tick mark values for the cumulative percentage axis. The values must be equally spaced and in increasing order, and the first value must be 0. You must scale the values in percentage units, and the last value must be greater than or equal to 100.

CUMAXISLABEL='label'

specifies a label, up to 40 characters, for the cumulative percentage axis. The default label is "Cumulative Percent" or "Cm Pct," depending on the space available.

CUMREF=value-list

requests reference lines perpendicular to the cumulative percentage axis at the specified values. You must specify the values in cumulative percentage units.

CUMREFLABELS='label1'$\ldots $'labeln'

specifies labels for the lines that are requested in the CUMREF= option. The number of labels must equal the number of lines requested. Enclose the labels in quotation marks. Labels can be up to 16 characters.

FREQ=variable

specifies a frequency variable whose values provide the counts (numbers of occurrences) of the values of the process variable. Specifying a frequency variable is equivalent to replicating the observations in the input data set. The variable must be a numeric variable that has nonnegative integer values. See Creating a Pareto Chart from Frequency Data for an illustration. If you specify more than one process variable in the chart statement, the variable values are used with each process variable. If you do not specify this option, each value of the process variable is counted exactly once.

FREQAXIS=value-list

specifies tick mark values for the frequency axis. The values must be equally spaced and in increasing order, and the first value must be 0. You must scale the values in the same units as the bars (see the SCALE= option), and the last value must be greater than or equal to the height of the largest bar.

FREQAXISLABEL='label'

specifies a label, up to 40 characters, for the frequency axis. If a WEIGHT= variable is specified, its label is the default frequency axis label. Otherwise, the default label depends on the value of the SCALE= option.

FREQOFFSET=value

specifies the length in screen percentage units of the offset at the upper end of the frequency axis.

FREQREF=value-list

specifies where reference lines perpendicular to the frequency axis are to appear on the chart. You must specify the values in the same units that are used to scale the frequency axis. By default, the frequency axis is scaled in percentage units, but you can specify other units in the SCALE= option. See Output 15.2.3 for an illustration.

FREQREFLABELS='label1'$\ldots $'labeln'

specifies labels for the lines that are requested in the FREQREF= option. The number of labels must equal the number of lines requested. Enclose the labels in quotation marks. Labels can be up to 16 characters.

GRID

adds a grid that corresponds to the frequency axis to the Pareto chart. Grid lines are positioned at tick marks on the frequency axis. The lines are useful for comparing the heights of the bars.

GRID2

adds a grid that corresponds to the cumulative percentage axis to the Pareto chart. Grid lines are positioned at tick marks on the cumulative percentage axis. The lines are useful for reading the cumulative percentage curve.

HLLEGLABEL='label'

specifies a label for the legend that is automatically created when you use a combination of the CHIGH(n) , CLOW(n) , PHIGH(n) , and PLOW(n) options. See Output 15.3.1 for an illustration. The label can be up to 16 characters and must be enclosed in quotation marks. The default label is "Bars:".

HREFLABPOS=n

specifies the vertical position of labels for reference lines that are associated with horizontal axes, which are specified in the FREQREF= and CUMREF= options in an HBAR statement or the CATREF= option in a VBAR statement. The available positions are described in the following table.

n

Position

1

Along top of chart

2

Staggered from top to bottom of chart

3

Along bottom of chart

4

Staggered from bottom to top of chart

By default, HREFLABPOS=1. Note: HREFLABPOS=2 and HREFLABPOS=4 are not supported for ODS Graphics output.

INTERTILE=value

specifies the distance in horizontal screen percentage units between tiles (cells) in a comparative Pareto chart. When ODS Graphics is enabled, the default value is 2%. In traditional graphics, the tiles are contiguous by default. See Output 15.1.3 for an illustration.

LABOTHER= 'other-label'

is used in conjunction with the BARLABEL=(variable) option and specifies a label for the 'other' category that is optionally specified in the OTHER= option.

LAST='category'

requests that the bar that corresponds to category be displayed last (at the bottom of a horizontal chart or the right end of a vertical chart) regardless of the frequency that is associated with this category. The category must be a formatted value of the process variable and must be enclosed in quotation marks. The category can be up to 64 characters. See Figure 15.6 for an illustration.

LOTHER='label'

specifies a label for the bar that is defined in the OTHER= option. This label appears in the legend that is specified in the BARLEGEND= option. The label must be enclosed in quotation marks and can be up to 32 characters. The default is the value that is specified in the OTHER= option. The LOTHER= option is applicable only when a BARLEGEND= variable is specified.

MARKERS

requests that the points on the cumulative percentage curve be plotted with markers in ODS Graphics output. You can use a SYMBOL statement to plot the points in traditional graphics output.

MAXCMPCT=percent

requests that only the Pareto categories that have the highest frequency counts be displayed, where the sum of their corresponding percentages is less than or equal to percent. For example, if you specify the following statements, the chart displays only the most frequently occurring categories that account for no more than 90% of the total frequency:

proc pareto data=failure;
   vbar cause / maxcmpct = 90;

You can use the OTHER= option in conjunction with the MAXCMPCT= option to create and display a new category that combines categories that are not selected by the MAXCMPCT= option. For example, if you specify the following statements, the chart displays the categories that account for no more than 90% of the total frequency, together with a category labeled "Others" that merges the remaining categories:

proc pareto data=failure;
   vbar cause / maxcmpct = 90
                other    = 'Others';

The MAXCMPCT= option is an alternative to the MINPCT= and MAXNCAT= options.

MAXNCAT=n

requests that only the Pareto categories with the n highest frequencies be displayed. For example, if you specify the following statements, the chart displays only the categories that have the 20 highest frequencies:

proc pareto data=failure;
   vbar cause / maxncat = 20;

If the total number of categories is less than 20, all the categories are displayed.

You can use the OTHER= option in conjunction with the MAXNCAT= option to create and display a new category that combines categories that are not selected by the MAXNCAT= option. For example, if you specify the following statements, the chart displays the categories that have the 19 highest frequencies, together with a category labeled "Others" that merges the remaining categories:

proc pareto data=failure;
   vbar cause / maxncat = 20
                other= 'Others';

See Figure 15.6 for another illustration.

The MAXNCAT= option is an alternative to the MINPCT= and MAXCMPCT= options.

MINPCT=percent

requests that only the Pareto categories whose frequency percentages are greater than or equal to percent be displayed. For example, if you specify the following statements, the chart displays only categories that have at least 5% of the total frequency:

proc pareto data=failure;
   vbar cause / minpct = 5;

You can use the OTHER= option in conjunction with the MINPCT= option to create and display a new category that combines categories that are not selected by the MINPCT= option. The merged category that is created by the OTHER= option is displayed even if its total percentage is less than percent. For example, if you specify the following statements, the chart displays the categories whose percentages are greater than or equal to 5%, together with a category labeled "Others" that merges the remaining categories:

proc pareto data=failure;
   vbar cause / minpct = 5
                other  = 'Others';

The MINPCT= option is an alternative to the MAXNCAT= and MAXCMPCT= options.

MISSING

requests that missing values of the process variable be treated as a Pareto category that is represented with a bar on the chart. If the process variable is a character variable, a missing value is defined as a blank internal (unformatted) value. If the process variable is numeric, a missing value is defined as any of the SAS missing values. If you do not specify this option, missing values are excluded from the analysis.

MISSING1

requests that missing values of the first CLASS= variable be treated as a level of the CLASS= variable. If the first CLASS= variable is a character variable, a missing value is defined as a blank internal (unformatted) value. If the first CLASS= variable is numeric, a missing value is defined as any of the SAS missing values. If you do not specify this option, observations in the DATA= data set for which the first CLASS= variable is missing are excluded from the analysis.

MISSING2

requests that missing values of the second CLASS= variable be treated as a level of the CLASS= variable. If the second CLASS= variable is a character variable, a missing value is defined as a blank internal (unformatted) value. If the second CLASS= variable is numeric, a missing value is defined as any of the SAS missing values. If you do not specify this option, observations in the DATA= data set for which the second CLASS= variable is missing are excluded from the analysis.

NCOLS=n
NCOL=n

specifies the number of columns in a comparative Pareto chart. You can use this option in conjunction with the NROWS= option. See Output 15.2.3 and Output 15.2.4 for an illustration. By default, NCOLS=1 and NROWS=2 if one CLASS= variable is specified, and NCOLS=2 and NROWS=2 if two CLASS= variables are specified.

NLEGEND
NLEGEND='label'
NLEGEND=(variable)

requests a sample size legend and specifies its form. You can specify the following values:

(no argument)

requests a sample size legend and specifies its form as N=n, where n is the total count for the Pareto categories. In a comparative Pareto chart, a legend is displayed in each tile, and n is the total count for that particular cell. See Output 15.2.1 for an illustration.

'label'

requests a sample size legend and specifies its form as label=n, where n is the total count for the Pareto categories. The label can be up to 32 characters and must be enclosed in quotation marks. For an illustration, see Figure 15.4 or Output 15.1.4.

(variable)

requests a sample size legend that is the value of variable from the DATA= data set. The formatted length of variable cannot exceed 32. If a format is associated with variable, then the formatted value is displayed. This option is intended for use with comparative Pareto charts and enables you to display a customized legend inside each tile (this legend does not need to provide a total count). It is assumed that the values of variable are identical for all observations in a particular class.

By default, the legend is placed in the upper left corner of the chart. If you specify the NOCURVE option, the legend is placed in the upper right corner of the chart. You can use the CFRAMENLEG= option to frame the sample size legend. No sample size legend is displayed if you do not specify an NLEGEND option.

NOCATLABEL

suppresses the category axis label. This option is useful for avoiding clutter where the meaning of the category axis is apparent from the labels for the Pareto categories. See Output 15.2.2 for an illustration.

NOCHART

suppresses the creation of a Pareto chart. This option is useful when you are simply creating an output data set.

NOCUMLABEL

suppresses the cumulative percentage axis label. This option is useful for avoiding clutter on comparative Pareto charts.

NOCUMTICK

suppresses the cumulative percentage axis label, tick marks, and tick mark labels.

NOCURVE

suppresses the cumulative percentage curve and the cumulative percentage axis. Compare Output 15.2.1 and Output 15.2.2 for an illustration.

NOFREQLABEL

suppresses the frequency axis label.

NOFREQTICK

suppresses the frequency axis label, tick marks, and tick mark labels.

NOHLLEG

suppresses the legend that is generated by the CHIGH(n)= , CLOW(n)= , PHIGH(n)= , and PLOW(n)= options.

NOKEYMOVE

suppresses the rearrangement of cells within a comparative Pareto chart that occurs when you use the CLASSKEY= option. By default, the key cell appears in the top left corner of a comparative Pareto chart.

NROWS=n
NROW=n

specifies the number of rows in a comparative Pareto chart. You can use the NROWS= option in conjunction with the NCOLS= option. See Output 15.2.3 and Output 15.2.4 for an illustration. By default, NROWS=2.

ODSFOOTNOTE=FOOTNOTE | FOOTNOTE1 | 'string'

adds a footnote to ODS Graphics output. You can specify the following values:

FOOTNOTE (or FOOTNOTE1)

uses the value of the SAS FOOTNOTE statement as the graph footnote.

'string'

uses string as the footnote. The quoted string can contain either of the following escaped characters, which are replaced with the appropriate values from the analysis:

$\backslash $n

is replaced by the process variable name.

$\backslash $l

is replaced by the process variable label (or name if the process variable has no label).

ODSFOOTNOTE2=FOOTNOTE2 | 'string'

adds a secondary footnote to ODS Graphics output. You can specify the following values:

FOOTNOTE2

uses the value of the SAS FOOTNOTE2 statement as the secondary graph footnote.

'string'

uses string as the secondary footnote. The quoted string can contain any of the following escaped characters, which are replaced with the appropriate values from the analysis:

$\backslash $n

is replaced by the process variable name.

$\backslash $l

is replaced by the process variable label (or name if the process variable has no label).

ODSTITLE=TITLE | TITLE1 | NONE | DEFAULT | LABELFMT | 'string'

specifies a title for ODS Graphics output. You can specify the following values:

TITLE (or TITLE1)

uses the value of the SAS TITLE statement as the graph title.

NONE

suppresses all titles from the graph.

DEFAULT

uses the default ODS Graphics title (a descriptive title that consists of the plot type and the process variable name).

LABELFMT

uses the default ODS Graphics title, but substitutes the process variable label for the process variable name.

'string'

uses string as the graph title. The quoted string can contain the following escaped characters, which are replaced with the appropriate values from the analysis:

$\backslash $n

is replaced by the process variable name.

$\backslash $l

is replaced by the process variable label (or name if the process variable has no label).

ODSTITLE2=TITLE2 | 'string'

specifies a secondary title for ODS Graphics output. You can specify the following values:

TITLE2

uses the value of the SAS TITLE2 statement as the secondary graph title.

'string'

uses string as the graph title. The quoted string can contain the following escaped characters, which are replaced with the appropriate values from the analysis:

$\backslash $n

is replaced by the process variable name.

$\backslash $l

is replaced by the process variable label (or name if the process variable has no label).

ORDER1=DATA | FORMATTED | FREQ | INTERNAL

specifies the display order for the values of the first CLASS= variable. The levels of the first CLASS= variable are always constructed using the formatted values of the variable, and the formatted values are always used to label the rows (columns) of a comparative Pareto chart. You can specify the following values:

DATA

displays the rows (columns) from top to bottom (left to right) in the order in which the values of the first CLASS= variable first appear in the input data set.

FORMATTED

displays the rows (columns) from top to bottom (left to right) in increasing order of the formatted values of the first CLASS= variable. For example, suppose you use a numeric CLASS= variable called Day (with values 1, 2, and 3) to create a one-way comparative Pareto chart. Also suppose you use the FORMAT procedure to associate the formatted values 1 = 'Wednesday', 2 = 'Thursday', and 3 = 'Friday' with Day. If you specify ORDER1=FORMATTED, the rows appear in alphabetical order (Friday, Thursday, Wednesday) from top to bottom.

FREQ

displays the rows (columns) from top to bottom (left to right) in order of decreasing frequency count. If two or more classes have the same frequency count, the order is determined by the formatted values.

INTERNAL

displays the rows (columns) from top to bottom (left to right) in increasing order of the internal (unformatted) values of the first CLASS= variable. If there are two or more distinct internal values that have the same formatted value, the order is determined by the internal value that occurs first in the input data set. In the previous example with variable Day, if you specify ORDER1=INTERNAL, the rows of the comparative chart appear in chronological order (Wednesday, Thursday, Friday) from top to bottom.

By default, ORDER1=INTERNAL.

ORDER2=INTERNAL | FORMATTED | DATA | FREQ

specifies the display order for the values of the second CLASS= variable. The levels of the second CLASS= variable are always constructed using the formatted values of the variable, and the formatted values are always used to label the columns of a two-way comparative Pareto chart.

The PARETO procedure determines the layout of a two-way comparative Pareto chart by first using the ORDER1= option to obtain the order of the rows from top to bottom (recall that ORDER1=INTERNAL by default). Then the ORDER2= option is applied to the observations that correspond to the first row to obtain the order of the columns from left to right. If any columns remain unordered (that is, the categories are unbalanced), the ORDER2= option is applied to the observations in the second row, and so on until all the columns have been ordered.

The values of the ORDER2= option are interpreted as described for the ORDER1= option. By default, ORDER2=INTERNAL.

OTHER='category'

specifies a new category that merges all categories that are not selected in the MAXNCAT= , MINPCT= , or MAXCMPCT= options. See the section Restricting the Number of Pareto Categories for an illustration.

The category should be specified as a formatted value of the process variable. The category can be up to 32 characters and must be enclosed in quotation marks. If you specify an OUT= data set, you should also specify an internal value that corresponds to category by specifying the OTHERCVAL= option or the OTHERNVAL= option.

The OTHER= option is not applicable unless you specify the MAXNCAT=, MINPCT=, or MAXCMPCT= option. You can use the COTHER= , LOTHER= , POTHER= , OTHERCVAL=, and OTHERNVAL= options with the OTHER= option.

OTHERCVAL='value'

specifies the internal (unformatted) value for a character process variable in the OUT= data set that corresponds to the category that is specified in the OTHER= option. The value can be up to 64 characters and must be enclosed in quotation marks.

The OTHERCVAL= option is not applicable unless you specify the OTHER= and OUT= options. If you specify the OTHER= option but not the OTHERCVAL= option, the value specified in the OTHER= option is written to the OUT= data set.

OTHERNVAL=value

specifies the internal (unformatted) value for a numeric process variable in the OUT= data set that corresponds to the category that is specified in the OTHER= option. The OTHERNVAL= option is not applicable unless you specify the OTHER= and OUT= options. If you specify the OTHER= option but not the OTHERNVAL= option, a missing value is written to the OUT= data set.

OUT=SAS-data-set

creates an output data set that contains the information that is displayed in the Pareto chart. This data set is useful if you want to create a report to accompany your chart. See Example 15.8 for an illustration.

SCALE=COUNT | FREQUENCY | PERCENT | WEIGHT

specifies the scale for the frequency axis. You can specify the following values:

COUNT or FREQUENCY

specifies that the scale is counts. See Output 15.1.4 for an illustration. This option is ignored if you specify the WEIGHT= option.

PERCENT

specifies that the scale is the percentage of the total frequency or, if you specify the WEIGHT= option, the percentage of the total weight.

WEIGHT

scales the vertical axis in the same units as the variable you specify in the WEIGHT= option. This option applies only if you specify the WEIGHT= option.

By default, SCALE=PERCENT. See Output 15.8.1 for an example.

Note: Regardless of the value you specify for the SCALE= option, the cumulative percentage axis is scaled in cumulative percentage units.

URL=variable

specifies URLs as values of the specified character variable (or formatted values of a numeric variable). These URLs are associated with bars on the Pareto chart when ODS Graphics output is directed into HTML. The value of variable should be the same for each observation that has a particular value of the process variable. The URL= option is not supported for traditional graphics output.

VREFLABPOS=n

specifies the vertical positioning of the labels for reference lines that are associated with vertical axes, which are specified in the CATREF= option in an HBAR statement or in the FREQREF= and CUMREF= options in a VBAR statement. If you specify VREFLABPOS=1, the labels are positioned at the left of the chart; if you specify VREFLABPOS=2, the labels are positioned at the right. By default, VREFLABPOS=1.

WEIGHT=variable-list

specifies weight variables that are used to construct weighted Pareto charts. Variables in the variable-list are paired with the process variables in order of specification. The WEIGHT= variables must be numeric, and their values must be nonnegative (noninteger values are permitted). If a WEIGHT= variable is not provided for a process variable, the weights applied to that process variable are assumed to be 1. See Weighted Pareto Charts for computational details.

A WEIGHT= variable is particularly useful for carrying out a Pareto analysis based on cost rather than frequency of occurrence. See Example 15.8 for an illustration.

Options for Traditional Graphics

You can specify the following options only when traditional graphics are produced. The PARETO procedure produces traditional graphics when ODS Graphics is disabled and SAS/GRAPH is licensed.

ANGLE=value

specifies an angle in degrees for rotating the labels on the category axis. The value is the angle between the baseline of the label and the category axis. See Output 15.1.1 and Output 15.1.2 for an illustration. The value must be greater than or equal to –90 and less than 90. The default value is 0.

ANNOKEY

applies the annotation requested in the ANNOTATE= and ANNOTATE2= options only to the key cell in a comparative Pareto chart. By default, annotation is applied to all of the cells.

ANNOTATE=SAS-data-set
ANNO=SAS-data-set

specifies an input data set that contains annotation variables as described in SAS/GRAPH: Reference. You can use the SAS-data-set to customize the Pareto charts that are produced by a single HBAR or VBAR statement. (A data set that is specified in the ANNOTATE= option in the PROC PARETO statement customizes charts that are produced by all HBAR and VBAR charts.) The SAS-data-set is associated with the frequency axis. If the annotation is based on data coordinates, you must use the same units as the frequency axis.

ANNOTATE2=SAS-data-set
ANNO2=SAS-data-set

specifies an input data set that contains annotation variables as described in SAS/GRAPH: Reference. You can use the SAS-data-set to customize the Pareto charts that are produced by a single HBAR or VBAR statement. (A data set that is specified in the ANNOTATE2= option in the PROC PARETO statement customizes charts that are produced by all HBAR and VBAR charts.) The SAS-data-set is associated with the cumulative percentage axis. If the annotation is based on data coordinates, you must use the same units as the cumulative percentage axis.

BARLABPOS=keyword

specifies the position for labels that are requested in the BARLABEL= option.

You can specify the following keywords in an HBAR statement:

HBAR

displays the label right-justified on the bar. If the label is longer than the bar, it is left-justified at the base of the bar.

HFIT

right-justifies the label on the bar. If the label is longer than the bar, the label is displayed to the right of the bar.

HLJUST

left-justifies the label at the base of the bar.

HRIGHT

displays the label to the right of the bar. If there is insufficient space for the label to the right of the bar, the label is right-justified at the right edge of the frame.

HRJUST

right-justifies the label at the right edge of the frame.

The default for an HBAR statement is BARLABPOS=HRIGHT.

You can specify the following keywords in a VBAR statement:

HCENTER

centers the label horizontally above the bar. If the centered label would extend outside the frame, the label is left-justified or right-justified at the edge of the frame.

HLJUST

left-justifies the label horizontally above the bar. The label is truncated if necessary.

VBAR

displays the label vertically on the bar. If the label is longer than the bar, it extends above the bar.

VFIT

displays the label vertically on or above the bar, depending on the available space. If the label is longer than the bar, it is displayed just below the top edge of the frame.

The default for a VBAR statement is to center the labels horizontally above the bars, with a reduction in text height if necessary. Reduction is not applied when the BARLABPOS= option is specified.

BARWIDTH=value

specifies the width of the bars in screen percentage units. By default, the bars are made as wide as possible.

CAXIS=color
CAXES=color
CA=color

specifies the color for the axis lines and tick marks. The default color is specified by the ContrastColor attribute of the GraphAxisLines style element in the current ODS style. If the NOGSTYLE option is in effect, color is also used for bar outlines and grid lines, unless overridden by the CBARLINE= , CGRID= , or GRID2= option.

CAXIS2=color

specifies the color for the tick mark labels and axis label that are associated with the cumulative percentage axis. By default, the color specified in the CTEXT= option (or its default) is used.

CBARLINE=color

specifies the color for bar outlines. The default color is specified by the ContrastColor attribute of the GraphOutlines style element in the current ODS style.

CBARS=color
CBARS=(variable-list)

specifies how the bars of the Pareto chart are colored. You can specify the following values:

color

uses a single color for all the bars. You can use this option in conjunction with the CHIGH(n) and CLOW(n) options.

variable-list

uses a distinct color for each bar (or combination of bars). The colors are specified as values of variables in the variable-list. Each variable must be a character variable. You can use the special value EMPTY to indicate that a bar is not to be colored. Note that variable-list must be enclosed in parentheses. You cannot specify a variable-list conjunction with the CHIGH(n) or CLOW(n) option.

If you specify more than one process variable, you can specify more than one CBARS= variable. The number of CBARS= variables should be less than or equal to the number of process variables. The two lists of variables are paired in order of specification.

If no CBARS= color or variable is specified for a process variable, the bars for its chart are displayed in the default color, which is determined by the Color attribute of the GraphData1 style element in the current ODS style.

If you specify one or more CBARS= variables, you can also use the BARLEGEND= option to add a legend to the chart that explains the significance of each color. Furthermore, you can use the PBARS= option to specify patterns in conjunction with the CBARS= option.

CCATREF=color

specifies the color for reference lines that are requested in the CATREF= option. The default color is specified by the ContrastColor attribute of the GraphReference style element in the current ODS style.

CCONNECT=color

specifies the color for the line segments that connect the points on the cumulative percentage curve. The default color is determined by the ContrastColor attribute of the GraphDataDefault style element in the current ODS style. You can specify the color for the points on the cumulative percentage curve in SYMBOL statement COLOR= option.

CCUMREF=color

specifies the color for reference lines that are requested in the CUMREF= option. The default color is specified by the ContrastColor attribute of the GraphReference style element in the current ODS style.

CFRAME=color

specifies the color for filling the area that is enclosed by the axes and the frame. The default color is specified by the Color attribute of the GraphWalls style element in the current ODS style. You cannot use the CFRAME= option in conjunction with the NOFRAME option or the CTILES= option.

CFRAMESIDE=color

specifies the color for filling the frame area for the row labels, which are displayed along the left side of a comparative Pareto chart. If a label is associated with the classification variable, color is also used to fill the frame area for this label. By default, the frame is transparent.

CFRAMETOP=color

specifies the color for filling the frame area for the column labels, which are displayed across the top of a comparative Pareto chart. If a label is associated with the classification variable, color is also used to fill the frame area for this label. By default, the frame is transparent.

CFREQREF=color

specifies the color for reference lines that are requested in the FREQREF= option. The default color is specified by the ContrastColor attribute of the GraphReference style element in the current ODS style.

CGRID=color

specifies the color for frequency axis grid lines. If you specify this option, you do not need to specify the GRID option. The default color is specified by the ContrastColor attribute of the GraphGridLines style element in the current ODS style.

CGRID2=color

specifies the color for cumulative percentage axis grid lines. If you specify this option, you do not need to specify the GRID2 option. The default color is specified by the ContrastColor attribute of the GraphGridLines style element in the current ODS style.

CLIPREF

draws reference lines that are requested in the CATREF= , CUMREF= , and FREQREF= options behind the bars on the Pareto chart. When the GSTYLE option is in effect, reference lines are drawn in front of the bars by default.

COTHER=color

specifies the color for the bar that is defined by the OTHER= option. By default the CFRAME= color is used. The COTHER= option is not applicable unless a BARS= or CBARS= variable is specified.

CTEXT=color
CT=color

specifies the color for text, such as tick mark labels, axis labels, and legends. The default color is specified by the Color attribute of a style element in the current ODS style. Axis labels use the GraphLabelText style element, and all other text uses the GraphValueText style element.

CTEXTSIDE=color

specifies the color for row labels, which are displayed along the left side of a comparative Pareto chart. If you do not specify a color, the color specified in the CTEXT= option is used. If neither option is specified, the color is determined by the Color attribute of the GraphValueText style element in the current ODS style.

CTEXTTOP=color

specifies the color for column labels, which are displayed across the top of a comparative Pareto chart. If you do not specify a color, the color specified in the CTEXT= option is used. If neither option is specified, the color is determined by the Color attribute of the GraphValueText style element in the current ODS style.

CTILES=(variable)

specifies a character variable whose values are the fill colors for the tiles in a comparative Pareto chart. This option generalizes the CFRAME= option, which provides a single color for all of the tiles. The variable must be enclosed in parentheses. The values of the variable must be identical for all observations that have the same level of the CLASS= variables. You can use the same color to fill more than one tile. You can use the special value EMPTY to indicate that a tile is not to be filled.

You cannot use the CTILES= option in conjunction with the NOFRAME or CFRAME= options. You can use the TILELEGEND= option in conjunction with the CTILES= option to add an explanatory legend for the CTILES= colors at the bottom of the chart. See Output 15.5.1 for an illustration.

DESCRIPTION='string'
DES='string'

specifies a description, up to 256 characters long, for the GRSEG catalog entry for a traditional graphics chart.

FONT=font

specifies a font for text that is used in labels and legends. The default font is determined by the FontFamily, FontStyle, and FontWeight attributes of a style element in the current ODS style; axis labels use the GraphLabelText style element and all other text uses the GraphValueText style element.

FRONTREF

draws reference lines that are requested in the CATREF= , FREQREF= , and CUMREF= options in front of the bars on the Pareto chart. When the NOGSTYLE option is in effect, reference lines are drawn behind the bars by default and can be obscured by them.

HEIGHT=value

specifies the height in screen percentage units of text for labels and legends. This option takes precedence over the GOPTONS HTEXT= option. The default value is specified by the FontSize attribute of the a style element in the current ODS style; axis labels use the GraphLabelText style element and all other text uses the GraphValueText style element.

HTML=variable

specifies a variable whose values create links that are associated with Pareto bars when traditional graphics output is directed into HTML. You can specify a character variable or a formatted numeric variable. The value of the HTML= variable should be the same for each observation that has a particular value of the process variable.

INFONT=font

specifies a font for bar labels, cumulative percentage curve labels, and sample size legends. This option takes precedence over the FONT= option and the FTEXT= option in the GOPTIONS statement. The default font is determined by the FontFamily, FontStyle, and FontWeight attributes of the GraphValueText style element in the current ODS style.

INHEIGHT=value

specifies the height in screen percentage units of bar labels, cumulative percentage curve labels, and sample size legends. This option takes precedence over the HEIGHT= option and the HTEXT= option in a GOPTIONS statement. The default value is specified by the FontSize attribute of the GraphValueText style element in the current ODS style.

INTERBAR=value

specifies the distance in screen percentage units between bars on the chart. By default, the bars are contiguous.

LCATREF=line-type

specifies the line type for reference lines that are requested in the CATREF= option. The default line type is specified by the LineStyle attribute of the GraphReference style element in the current ODS style.

LCUMREF=line-type

specifies the line type for reference lines that are requested in the CUMREF= option. The default line type is specified by the LineStyle attribute of the GraphReference style element in the current ODS style.

LFREQREF=line-type

specifies the line type for lines that are requested in the FREQREF= option. The default line type is specified by the LineStyle attribute of the GraphReference style element in the current ODS style.

LGRID=line-type

specifies the line type for frequency axis grid lines. If you specify this option, you do not need to specify the GRID option. The default line type is specified by the LineStyle attribute of the GraphGridLines style element in the current ODS style.

LGRID2=line-type

specifies the line type for cumulative percentage axis grid lines. If you specify this option, you do not need to specify the GRID2 option. The default line type is specified by the LineStyle attribute of the GraphGridLines style element in the current ODS style.

NAME='string'

specifies the name of the GRSEG catalog entry for a traditional graphics chart, and the name of the graphics output file if one is created. The name can be up to 256 characters long, but the GRSEG name is truncated to eight characters. The default name is "PARETO".

NOFRAME

suppresses the frame that is drawn around the chart by default. You cannot specify the NOFRAME option in conjunction with the CFRAME= or TILES= options.

PBARS=pattern
PBARS=(variable-list)

specifies pattern fills for the bars. You can specify the following values:

pattern

uses a single pattern for all the bars. You can use this approach in conjunction with the PHIGH(n)= and PLOW(n)= options.

variable-list

uses a distinct pattern for each bar (or combination of bars). You provide the patterns as values of variables in the variable-list. For example, you might use the solid pattern ('S') to indicate severe problems and the empty pattern ('E') for all other problems. Each variable must be a character variable of length eight, and the variable-list must be enclosed in parentheses. You cannot specify a variable-list in conjunction with the PHIGH(n)= and PLOW(n)= options.

If you specify more than one process variable in the chart statement, you can provide more than one variable in the variable-list. The number of variables in the variable-list should be less than or equal to the number of process variables. The two lists of variables are paired in order of specification. If a variable is not provided in the variable-list for a process variable, the bars for that chart are not filled.

If you specify a variable-list, you can also use the BARLEGEND= option to add a legend to the chart that explains the significance of each pattern.

You can use the CBARS= option to specify colors in conjunction with the PBARS= option.

PHIGH(n)=pattern

specifies the pattern for the bars that have the n highest values. You cannot specify this option in conjunction with a PBARS= variable-list, but you can specify this option together with the PLOW(n)= and PBARS=pattern options.

PLOW(n)=pattern

specifies the pattern for the bars that have the n lowest values. You cannot specify this option in conjunction with a PBARS= variable-list, but you can use this option together with the PHIGH(n)= and PBARS=pattern options.

POTHER=pattern

specifies the pattern for the bar that is defined by the OTHER= option. This option applies only if you specify a PBARS= variable-list.

TILELEGEND=(variable)

specifies a variable that is used to add a legend for CTILES= colors. The variable can have a formatted length less than or equal to 32. If a format is associated with the variable, then the formatted value is displayed. You must specify the TILELEGEND= option in conjunction with the CTILES= option. If you specify the CTILES= option but do not specify the TILELEGEND= option, a color legend is not displayed.

The values of the CTILES= and TILELEGEND= variables should be consistent for all observations that have the same level of the CLASS= variables. The value of the TILELEGEND= variable is used to identify the corresponding color value of the CTILES= variable in the legend. See Output 15.5.1 for an illustration.

TILELEGLABEL='label'

specifies a label for the legend that is created when you specify a TILELEGEND= variable. The label can be up to 16 characters and must be enclosed in quotation marks. The default is "Tiles:". See Output 15.5.1 for an illustration.

TURNVLABEL
TURNVLABELS

turns and strings out vertically the characters in the labels for the frequency and cumulative percentage axes. The TURNVLABELS option is valid only in a VBAR statement.

WAXIS=n

specifies the line thickness (in pixels) for the axes and frame. This thickness is also used for bar outlines and grid lines, unless overridden by the WBARLINE= , WGRID= , or WGRID2 = option. The default line thickness is specified by the LineThickness attribute of the GraphAxisLines style element in the current ODS style.

WBARLINE=n

specifies the width for bar outlines. The default outline thickness is specified by the LineThickness attribute of the GraphOutlines style element in the current ODS style.

WGRID=n

specifies the width of the frequency axis grid lines. If you specify this option, the GRID option is not required. The default line thickness is specified by the LineThickness attribute of the GraphGridLines style element in the current ODS style.

WGRID2=n

specifies the width of the cumulative percentage axis grid lines. If you specify this option, the GRID2 option is not required. The default line thickness is specified by the LineThickness attribute of the GraphGridLines style element in the current ODS style.

Options for Legacy Line Printer Charts

Note: The HBAR statement does not produce legacy line printer charts, so the following options apply only to the VBAR statement.

CONNECTCHAR='character'
CCHAR='character'

specifies the plot character for line segments that connect points on the cumulative percentage curve. The default character is a plus sign (+).

HREFCHAR='character'

specifies the plot character used to form the lines that are requested in the CATREF= option. The default character is a vertical bar (|).

SYMBOLCHAR='character'

specifies the plot character for points on the cumulative percentage curve. The default character is an asterisk (*).

VREFCHAR='character'

specifies the character to be used to form the lines that are requested in the FREQREF= and CUMREF= options. The default character is a dash (-).