Details and Examples: PARETO Procedure

Terminology

Basic Pareto Charts

A basic Pareto chart (see Figure 15.6) analyzes the unique values of a process variable, which are referred to as Pareto categories or levels. These values typically represent problems encountered during some phase of a manufacturing or service activity.

A basic vertical Pareto chart (as produced by the PARETO procedure’s VBAR statement) has one horizontal and two vertical axes. The horizontal (or category) axis is displayed at the bottom of the chart and lists the Pareto categories. The primary vertical axis (or frequency axis) is displayed on the left. The relative frequency of each Pareto category is represented by a vertical bar whose height is measured on the primary vertical axis. You can use the SCALE= option to scale this axis in percent, count, or weight units. The secondary vertical axis (or cumulative percent axis) is displayed on the right. This axis is scaled in cumulative percent units and is used to read the cumulative percent curve. The height of each point on the curve represents the percent of the total frequency accounted for by the Pareto categories to the left of the point.

In a horizontal Pareto chart (as produced by the HBAR statement), the category axis is displayed vertically on the left. Categories appear in order of decreasing relative frequency from top to bottom. The frequency axis appears at the top of the chart and the cumulative percent axis is at the bottom. The relative frequencies of the Pareto categories are represented by horizontal bars. A point on the cumulative percent curve represents the percent of the total frequency accounted for by the Pareto categories above that point.

Note: For the sake of brevity, in this chapter the term height is used to refer to the size of a bar as measured along the frequency axis, whether the Pareto chart is oriented vertically or horizontally.

Restricted Pareto Charts

A restricted Pareto chart (see Figure 15.10) displays only the n most frequently occurring categories in a data set that contains N categories, where $N>n$. The remaining $N-n$ categories are dropped or are merged into a single other category created with the OTHER= option. The MAXCMPCT=, MAXNCAT=, and MINPCT= options provide alternative methods for specifying n. See the entries for these options in the section Dictionary of Options.

Weighted Pareto Charts

A weighted Pareto chart (see Example 15.8) displays bars whose heights represent the weighted frequencies of the categories. Typical weights are the cost of repair or the loss incurred by the customer.

The weight $W_{i}$ for the ith Pareto category is computed as

\[  W_{i} = \sum _{u \in \mc {C}_ i} w(u) f(u)  \]

where $\mc {C}_ i$ is the set of observations that make up the ith category, $w(u)$ is the value of the weight variable in the uth observation, and $f(u)$ is the value of the frequency variable in the uth observation (taking $f(u) \equiv 1$ if a FREQ= variable is not specified). If SCALE=WEIGHT is specified, the height of the bar for the ith category is $W_{i}$. If SCALE=PERCENT is specified, the height of this bar is

\[  \frac{100 W_{i}}{ \sum _{j=1}^{N} W_{j} }  \]

where N is the total number of categories.

Comparative Pareto Charts

A comparative Pareto chart combines two or more Pareto charts for the same process variable. The component charts are displayed with uniform axes to facilitate comparison. The observations represented by a components chart are referred to as a cell. The framed areas for the component charts are referred to as tiles.

In a one-way comparative Pareto chart, each component chart corresponds to a different level of a single classification variable specified with the CLASS= option. The component charts are arranged in a stack or a row, as illustrated in Output 15.1.3, Output 15.2.2, and Output 15.2.3. In a two-way comparative Pareto chart, each component chart corresponds to a different combination of levels of two classification variables specified with the CLASS= option. The component charts are arranged in a matrix, as illustrated in Output 15.2.4.

In any comparative Pareto chart there is a key cell, in which the bars are in decreasing order and whose order is imposed on all the other cells to achieve a uniform category axis. By default, the key cell is the cell in the upper left corner, but you can use the CLASSKEY= option to designate any other cell as the key cell. In this case, the rows and columns of the comparative chart will be rearranged so that the key cell appears in the upper left. However, if you require the rows and columns in a particular order, you can specify the NOKEYMOVE option in conjunction with the CLASSKEY= option to suppress the rearrangement.

If you are creating traditional graphics or ODS Graphics output, you can use the NROWS= and NCOLS= options to specify the numbers of rows and columns in a comparative Pareto chart. By default, NROWS=2 and NCOLS=1 for a one-way comparison and NROWS=2 and NCOLS=2 for a two-way comparison. There is no upper limit to the number of rows or columns that you can specify, but in practice the limit is determined by the area of the graphical display. If the numbers of classification variable levels exceed the NROWS= and NCOLS= values, the chart is created on multiple screens or pages.

If the same set of Pareto categories does not occur in each cell of a comparative Pareto chart, the categories are said to be unbalanced. In this case, the procedure uses the following convention to construct the uniform category axis. First, the categories that occur in the key cell are arranged on the category axis from left to right (top to bottom for a horizontal chart), sorted in decreasing order of frequency, with tied levels arranged in order of their formatted values. The categories not in the key cell are assigned frequencies of zero in the key cell, and they are arranged at the right (bottom) of the category axis, where they are ordered by their formatted values. This arrangement is simply a convention of the procedure and should not be interpreted to mean that one category is more important than another.

Whether the categories in the input data set are balanced or not, the categories in the OUT= data set are always balanced. The procedure balances this data set by assigning values of zero to the _COUNT_ and _PCT_ variables as necessary.

Unbalanced categories present a special problem when the MAXNCAT= option is used to restrict the number of categories displayed on the chart. For instance, suppose that you specify MAXNCAT=12 and there are 15 categories in all, 10 of which occur in the key cell. Since there is no unambiguous method for selecting two of the remaining five categories to complete the restricted list, the procedure reduces the restricted list to the categories that occur in the key cell and displays only those 10 categories. A warning message is issued in the SAS log.