
Note: See Basic and Comparative Pareto Charts in the SAS/QC Sample Library.
During the manufacture of a MOS capacitor, different cleaning processes were used by two manufacturing systems operating in
parallel. Process A used a standard cleaning solution, and Process B used a different cleaning mixture that contained less
particulate matter. The failure causes that were observed with each process for five consecutive days were recorded and saved
in a SAS data set called Failure4:
data Failure4; length Process $ 9 Cause $ 16; label Cause = 'Cause of Failure'; input Process & $ Day & $ Cause & $ Counts; datalines; Process A March 1 Contamination 15 Process A March 1 Corrosion 2 Process A March 1 Doping 1 Process A March 1 Metallization 2 Process A March 1 Miscellaneous 3 Process A March 1 Oxide Defect 8 Process A March 1 Silicon Defect 1 Process A March 2 Contamination 16 Process A March 2 Corrosion 3 Process A March 2 Doping 1 Process A March 2 Metallization 3 Process A March 2 Miscellaneous 1 Process A March 2 Oxide Defect 9 Process A March 2 Silicon Defect 2 Process A March 3 Contamination 20 Process A March 3 Corrosion 1 Process A March 3 Doping 1 Process A March 3 Metallization 0 Process A March 3 Miscellaneous 3 Process A March 3 Oxide Defect 7 Process A March 3 Silicon Defect 2 Process A March 4 Contamination 12 Process A March 4 Corrosion 1 Process A March 4 Doping 1 Process A March 4 Metallization 0 Process A March 4 Miscellaneous 0 Process A March 4 Oxide Defect 10 Process A March 4 Silicon Defect 1 Process A March 5 Contamination 23 Process A March 5 Corrosion 1 Process A March 5 Doping 1 Process A March 5 Metallization 0 Process A March 5 Miscellaneous 1 Process A March 5 Oxide Defect 8 Process A March 5 Silicon Defect 2 Process B March 1 Contamination 8 Process B March 1 Corrosion 2 Process B March 1 Doping 1 Process B March 1 Metallization 4 Process B March 1 Miscellaneous 2 Process B March 1 Oxide Defect 10 Process B March 1 Silicon Defect 3 Process B March 2 Contamination 9 Process B March 2 Corrosion 0 Process B March 2 Doping 1 Process B March 2 Metallization 2 Process B March 2 Miscellaneous 4 Process B March 2 Oxide Defect 9 Process B March 2 Silicon Defect 2 Process B March 3 Contamination 4 Process B March 3 Corrosion 1 Process B March 3 Doping 1 Process B March 3 Metallization 0 Process B March 3 Miscellaneous 0 Process B March 3 Oxide Defect 10 Process B March 3 Silicon Defect 1 Process B March 4 Contamination 2 Process B March 4 Corrosion 2 Process B March 4 Doping 1 Process B March 4 Metallization 0 Process B March 4 Miscellaneous 3 Process B March 4 Oxide Defect 7 Process B March 4 Silicon Defect 1 Process B March 5 Contamination 1 Process B March 5 Corrosion 3 Process B March 5 Doping 1 Process B March 5 Metallization 0 Process B March 5 Miscellaneous 1 Process B March 5 Oxide Defect 8 Process B March 5 Silicon Defect 2 ;
In addition to the process variable Cause, this data set has two classification variables: Process and Day. The variable Counts is a frequency variable.
This example creates a series of displays that progressively use more of the classification information.
The following statements create the first display, which analyzes the process variable without taking into account the classification variables:
title 'Pareto Analysis of Capacitor Failures';
proc pareto data=Failure4;
vbar Cause / freq = Counts
last = 'Miscellaneous'
scale = count
anchor = bl
odstitle = title
nlegend;
run;
The chart, shown in Output 15.2.1, indicates that contamination is the most frequently occurring problem.
Output 15.2.1: Pareto Analysis without Classification Variables

The ANCHOR= BL option anchors the cumulative percentage curve at the bottom left (BL) of the first bar. The NLEGEND option adds a sample size legend.
Process
The following statements specify Process as a classification variable to create a comparative Pareto chart, which is displayed in Output 15.2.2:
proc pareto data=Failure4;
vbar Cause / class = Process
freq = Counts
last = 'Miscellaneous'
scale = count
odstitle = title
nocurve
nlegend;
run;
Output 15.2.2: One-Way Comparative Pareto Analysis with CLASS=Process

Each cell corresponds to a level of the CLASS=
variable (Process). By default, the cells are arranged from top to bottom in alphabetical order of the formatted values of Process, and the key cell is the top cell. The main difference in the two cells is a decrease in contamination when Process B is
used.
The NOCURVE option suppresses the cumulative percentage curve, along with the cumulative percentage axis.
Day
The following statements specify Day as a classification variable:
title 'Pareto Analysis by Day';
proc pareto data=Failure4;
vbar Cause / class = Day
freq = Counts
last = 'Miscellaneous'
scale = count
catleglabel = 'Failure Causes:'
odstitle = title
nrows = 1
ncols = 5
freqref = 5 10 15 20
nocatlabel
nocurve
nlegend;
run;
The NROWS= and NCOLS= options display the cells in a side-by-side arrangement. The FREQREF= option adds reference lines perpendicular to the frequency axis. The NOCATLABEL option suppresses the category axis labels, and the CATLEGLABEL= option incorporates that information into the category legend label. The chart is displayed in Output 15.2.3.
Output 15.2.3: One-Way Comparative Pareto Analysis with CLASS=Day

By default, the key cell is the leftmost cell. There were no failures due to metallization starting on March 3 (in fact, process controls to reduce this problem were introduced on this day).
Process and Day
The following statements specify both Process and Day as CLASS= variables to create a two-way comparative Pareto chart:
title 'Pareto Analysis by Process and Day';
proc pareto data=Failure4;
vbar Cause / class = ( Process Day )
freq = Counts
nrows = 2
ncols = 5
last = 'Miscellaneous'
scale = count
catleglabel = 'Failure Causes:'
odstitle = title
nocatlabel
nocurve
nlegend;
run;
The chart is displayed in Output 15.2.4.
Output 15.2.4: Two-Way Comparative Pareto Analysis for Process and Day

The cells are arranged in a matrix whose rows correspond to levels of the first CLASS= variable (Process) and whose columns correspond to levels of the second CLASS= variable (Day). The dimensions of the matrix are specified in the NROWS= and NCOLS= options. The key cell is in the upper left corner.
The chart reveals continuous improvement when Process B is used.