The PARETO Procedure

Example 15.1 Creating Before-and-After Pareto Charts

Note: See Before & After Pareto Charts Using a BY Variable in the SAS/QC Sample Library.

During the manufacture of a metal-oxide semiconductor (MOS) capacitor, causes of failures were recorded before and after a tube in the diffusion furnace was cleaned. This information was saved in a SAS data set named Failure3:

data Failure3;
   length Cause $ 16 Stage $ 16;
   label  Cause = 'Cause of Failure';
   input  Stage & $ Cause & $ Counts;
datalines;
Before Cleaning   Contamination    14
Before Cleaning   Corrosion         2
Before Cleaning   Doping            1
Before Cleaning   Metallization     2
Before Cleaning   Miscellaneous     3
Before Cleaning   Oxide Defect      8
Before Cleaning   Silicon Defect    1
After Cleaning    Doping            0
After Cleaning    Corrosion         2
After Cleaning    Metallization     4
After Cleaning    Miscellaneous     2
After Cleaning    Oxide Defect      1
After Cleaning    Contamination    12
After Cleaning    Silicon Defect    2
;

To compare distribution of failures before and after cleaning, you can use the BY statement to create two separate Pareto charts, one for the observations in which Stage is equal to Before Cleaning and one for the observations in which Stage is equal to After Cleaning:

proc sort data=Failure3;
   by Stage;
run;

title 'Pareto Effect of Furnace Tube';
proc pareto data=Failure3;
   vbar Cause / freq     = Counts
                odstitle = title;
   by Stage;
run;

The SORT procedure sorts the observations in order of the values of Stage. It is not necessary to sort by the values of Cause because this is done by the PARETO procedure. The two charts, displayed in Output 15.1.1 and Output 15.1.2, reveal a reduction in oxide defects after the tube was cleaned. This is a relative reduction, because the frequency axes are scaled in percentage units. Note that the After Cleaning chart is produced first, based on alphabetical sorting of BY groups.

Output 15.1.1: “After” Analysis Using Stage as a BY Variable

Output 15.1.2: “Before” Analysis Using Stage as a BY Variable

In general, it is difficult to compare Pareto charts that are created by using BY processing because their axes are not necessarily uniform. A better approach is to construct a comparative Pareto chart, as illustrated by the following statements:

title 'Comparison of Integrated Circuit Failures';
proc pareto data=Failure3;
   vbar Cause / class      = Stage
                freq       = Counts
                scale      = percent
                intertile  = 5.0
                classkey   = 'Before Cleaning'
                odstitle   = title;
run;

The CLASS= option designates Stage as a classification variable, and this directs PROC PARETO to create the one-way comparative Pareto chart shown in Output 15.1.3, which displays a component chart for each level of Stage. The INTERTILE= option separates the cells with an offset of 5 screen percentage units.

Output 15.1.3: Before-and-After Analysis That Uses a Comparative Pareto Chart

In a comparative Pareto chart, there is always one special cell, called the key cell, in which the bars are displayed in decreasing order, and whose order determines the uniform category axis that is used for all the cells. The key cell is positioned at the top of the chart. Here, the key cell is the set of observations for which Stage equals Before Cleaning, as specified by the CLASSKEY= option. By default, the levels are sorted in the order determined by the ORDER1= option, and the key cell is the level that occurs first in this order.

In many applications, it can be more revealing to base comparisons on counts rather than percentages. The following statements construct a chart that has a frequency scale:

title 'Comparison of Integrated Circuit Failures';
proc pareto data=Failure3;
   vbar Cause / class     = Stage
                freq      = Counts
                scale     = count
                nlegend   = 'Total Circuits'
                classkey  = 'Before Cleaning'
                odstitle  = title
                cframenleg
                cprop;
run;

Specifying SCALE=COUNT scales the frequency axis in count units. The NLEGEND= option adds a sample size legend, and the CFRAMENLEG option frames the legend. The CPROP option adds bars that indicate the proportion of total frequency represented by each cell.

The chart is shown in Output 15.1.4.

Output 15.1.4: Before-and-After Analysis Using Comparative Pareto Chart

Note that the lower cumulative percentage curve in Output 15.1.4 is not anchored to the first bar. This is a consequence of the uniform frequency scale and of the fact that the number of observations in each cell is not the same.