The PARETO Procedure

Terminology

Basic Pareto Charts

A basic Pareto chart (see Figure 15.1) analyzes the unique values of a process variable. These values are called Pareto categories or levels, and they usually represent problems that are encountered during some phase of a manufacturing or service activity.

A basic vertical Pareto chart (as produced by the PARETO procedure’s VBAR statement) has one horizontal axis and two vertical axes:

  • The category axis is displayed horizontally at the bottom of the chart and lists the Pareto categories.

  • The frequency axis (or primary vertical axis) is displayed on the left. The relative frequency of each Pareto category is represented by a vertical bar whose height is measured on the frequency axis. You can use the SCALE= option to scale this axis in percentage, count, or weight units.

  • The cumulative percentage axis (or secondary vertical axis) is displayed on the right. This axis is scaled in cumulative percentage units and is used to read the cumulative percentage curve. The height of each point on the curve represents the percentage of the total frequency that is accounted for by the Pareto categories to the left of the point.

A horizontal Pareto chart (as produced by the HBAR statement), is essentially a vertical Pareto chart rotated 90 degrees clockwise. The category axis is displayed vertically on the left. Categories appear in order of decreasing relative frequency from top to bottom. The frequency axis appears at the top of the chart and the cumulative percentage axis appears at the bottom. The relative frequencies of the Pareto categories are represented by horizontal bars. A point on the cumulative percentage curve represents the percentage of the total frequency that is accounted for by the Pareto categories above that point.

Note: For the sake of brevity, in this chapter the term height refers to the size of a bar as measured along the frequency axis, whether the Pareto chart is oriented vertically or horizontally.

Restricted Pareto Charts

A restricted Pareto chart (see Figure 15.6) displays only the n most frequently occurring categories in a data set that contains N categories, where $N>n$. The remaining $N-n$ categories are dropped or are merged into a single other category that is created when you specify the OTHER= option. The MAXCMPCT=, MAXNCAT=, and MINPCT= options provide alternative methods for specifying n. See the entries for these options in the section Dictionary of HBAR and VBAR Statement Options.

Weighted Pareto Charts

A weighted Pareto chart (see Example 15.8) displays bars whose heights represent the weighted frequencies of the categories. Typical weights are the cost of repair or the loss incurred by the customer.

The weight $W_{i}$ for the ith Pareto category is computed as

\[  W_{i} = \sum _{u \in \mc {C}_ i} w(u) f(u)  \]

where $\mc {C}_ i$ is the set of observations that make up the ith category, $w(u)$ is the value of the weight variable in the uth observation, and $f(u)$ is the value of the frequency variable in the uth observation (taking $f(u) \equiv 1$ if a FREQ= variable is not specified). If SCALE=WEIGHT is specified, the height of the bar for the ith category is $W_{i}$. If SCALE=PERCENT is specified, the height of this bar is

\[  \frac{100 W_{i}}{ \sum _{j=1}^{N} W_{j} }  \]

where N is the total number of categories.

Comparative Pareto Charts

A comparative Pareto chart combines two or more Pareto charts for the same process variable. The component charts are displayed with uniform axes to facilitate comparison. The observations that are represented by a component chart are called a cell. The framed areas for the component charts are called tiles.

In a one-way comparative Pareto chart, each component chart corresponds to a different level of a single classification variable, which is specified in the CLASS= option. The component charts are arranged in a stack or a row, as illustrated in Output 15.1.3, Output 15.1.4, Output 15.2.2, and Output 15.2.3. In a two-way comparative Pareto chart, each component chart corresponds to a different combination of levels of two classification variables, which are specified in the CLASS= option. The component charts are arranged in a matrix, as illustrated in Output 15.2.4.

Every comparative Pareto chart has a key cell, in which the bars are in decreasing order and whose order is imposed on all the other cells to achieve a uniform category axis. By default, the key cell is the cell in the upper left corner, but you can use the CLASSKEY= option to designate any other cell as the key cell. If you designate another cell as the key cell, the rows and columns of the comparative chart are rearranged so that the key cell appears in the upper left. However, if you require the rows and columns in a particular order, you can specify the NOKEYMOVE option in conjunction with the CLASSKEY= option to suppress the rearrangement.

You can use the NROWS= and NCOLS= options to specify the numbers of rows and columns in a comparative Pareto chart. By default, NROWS=2 and NCOLS=1 for a one-way comparison and NROWS=2 and NCOLS=2 for a two-way comparison. There is no upper limit to the number of rows or columns that you can specify, but in practice the limit is determined by the area of the graphical display. If the numbers of classification variable levels exceed the NROWS= and NCOLS= values, the chart is created on multiple panels or pages.

If the same set of Pareto categories does not occur in each cell of a comparative Pareto chart, the categories are said to be unbalanced. In this case, PROC PARETO uses the following convention to construct the uniform category axis. First, the categories that occur in the key cell are arranged on the category axis from left to right (top to bottom for a horizontal chart) and sorted in decreasing order of frequency, with tied levels arranged in order of their formatted values. The categories not in the key cell are assigned frequencies of 0 in the key cell, and they are arranged at the right (bottom) of the category axis, where they are ordered by their formatted values. This arrangement is simply a convention of the PARETO procedure and should not be interpreted to mean that one category is more important than another.

Whether the categories in the input data set are balanced or not, the categories in the OUT= data set are always balanced. PROC PARETO balances this data set by assigning values of 0 to the _COUNT_ and _PCT_ variables as necessary.

Unbalanced categories present a special problem when the MAXNCAT= option is used to restrict the number of categories that are displayed on the chart. For example, suppose that you specify MAXNCAT=12 and there are 15 categories in all, 10 of which occur in the key cell. Because there is no unambiguous method for selecting two of the remaining five categories to complete the restricted list, the PARETO procedure reduces the restricted list to the categories that occur in the key cell and displays only those 10 categories. A warning message is issued in the SAS log.