Note: See Before & After Pareto Charts Using a BY Variable in the SAS/QC Sample Library.
During the manufacture of a metal-oxide semiconductor (MOS) capacitor, causes of failures were recorded before and after a
tube in the diffusion furnace was cleaned. This information was saved in a SAS data set named Failure3
:
data Failure3; length Cause $ 16 Stage $ 16; label Cause = 'Cause of Failure'; input Stage & $ Cause & $ Counts; datalines; Before Cleaning Contamination 14 Before Cleaning Corrosion 2 Before Cleaning Doping 1 Before Cleaning Metallization 2 Before Cleaning Miscellaneous 3 Before Cleaning Oxide Defect 8 Before Cleaning Silicon Defect 1 After Cleaning Doping 0 After Cleaning Corrosion 2 After Cleaning Metallization 4 After Cleaning Miscellaneous 2 After Cleaning Oxide Defect 1 After Cleaning Contamination 12 After Cleaning Silicon Defect 2 ;
To compare distribution of failures before and after cleaning, you can use the BY statement to create two separate Pareto
charts, one for the observations in which Stage
is equal to Before Cleaning
and one for the observations in which Stage
is equal to After Cleaning
:
proc sort data=Failure3; by Stage; run;
title 'Pareto Effect of Furnace Tube'; proc pareto data=Failure3; vbar Cause / freq = Counts odstitle = title; by Stage; run;
The SORT procedure sorts the observations in order of the values of Stage
. It is not necessary to sort by the values of Cause
because this is done by the PARETO procedure. The two charts, displayed in Output 15.1.1 and Output 15.1.2, reveal a reduction in oxide defects after the tube was cleaned. This is a relative reduction, because the frequency axes
are scaled in percentage units. Note that the After Cleaning
chart is produced first, based on alphabetical sorting of BY groups.
In general, it is difficult to compare Pareto charts that are created by using BY processing because their axes are not necessarily uniform. A better approach is to construct a comparative Pareto chart, as illustrated by the following statements:
title 'Comparison of Integrated Circuit Failures'; proc pareto data=Failure3; vbar Cause / class = Stage freq = Counts scale = percent intertile = 5.0 classkey = 'Before Cleaning' odstitle = title; run;
The CLASS= option designates Stage
as a classification variable, and this directs PROC PARETO to create the one-way comparative Pareto chart shown in Output 15.1.3, which displays a component chart for each level of Stage
. The INTERTILE= option separates the cells with an offset of 5 screen percentage units.
In a comparative Pareto chart, there is always one special cell, called the key cell,
in which the bars are displayed in decreasing order, and whose order determines the uniform category axis that is used for
all the cells. The key cell is positioned at the top of the chart. Here, the key cell is the set of observations for which
Stage
equals Before Cleaning
, as specified by the CLASSKEY= option. By default, the levels are sorted in the order determined by the ORDER1= option, and
the key cell is the level that occurs first in this order.
In many applications, it can be more revealing to base comparisons on counts rather than percentages. The following statements construct a chart that has a frequency scale:
title 'Comparison of Integrated Circuit Failures'; proc pareto data=Failure3; vbar Cause / class = Stage freq = Counts scale = count nlegend = 'Total Circuits' classkey = 'Before Cleaning' odstitle = title cframenleg cprop; run;
Specifying SCALE=COUNT scales the frequency axis in count units. The NLEGEND= option adds a sample size legend, and the CFRAMENLEG option frames the legend. The CPROP option adds bars that indicate the proportion of total frequency represented by each cell.
The chart is shown in Output 15.1.4.
Note that the lower cumulative percentage curve in Output 15.1.4 is not anchored to the first bar. This is a consequence of the uniform frequency scale and of the fact that the number of observations in each cell is not the same.