Creating a Pareto Chart from Raw Data

See PARETO4 in the SAS/QC Sample LibraryIn the fabrication of integrated circuits, common causes of failures include improper doping, corrosion, surface contamination, silicon defects, metallization, and oxide defects. The causes of 31 failures were recorded in a SAS data set called Failure1.

data Failure1;
   length Cause $ 16;
   label  Cause = 'Cause of Failure';
   input  Cause & $;
   datalines;
Corrosion
Oxide Defect
Contamination
Oxide Defect
Oxide Defect
Miscellaneous
Oxide Defect
Contamination
Metallization
Oxide Defect
Contamination
Contamination
Oxide Defect
Contamination
Contamination
Contamination
Corrosion
Silicon Defect
Miscellaneous
Contamination
Contamination
Contamination
Miscellaneous
Contamination
Contamination
Doping
Oxide Defect
Oxide Defect
Metallization
Contamination
Contamination
;

Each of the 31 observations corresponds to a different circuit, and the value of Cause provides the cause for the failure. These are raw data in the sense that there is more than one observation with the same value of Cause, and the observations are not sorted by Cause. The following statements produce a basic Pareto chart for the failures:

ods graphics off;
symbol v=dot;
proc pareto data=Failure1;
   vbar Cause;
run;

The PARETO procedure is invoked with the first statement, referred to as the PROC statement. You specify the process variable to be analyzed in the VBAR statement.

The Pareto chart is shown in Figure 15.6.

Figure 15.6: Pareto Chart for IC Failures in the Data Set Failure1

Pareto Chart for IC Failures in the Data Set Failure1


The procedure has classified the values of Cause into seven distinct categories (levels). The bars represent the percent of failures in each category, and they are arranged in decreasing order. Thus, the most frequently occurring category is Contamination, which accounts for 45% of the failures. The Pareto curve indicates the cumulative percent of failures from left to right; for example, Contamination and Oxide together account for 71% of the failures.

If there is sufficient space, the procedure labels the bars along the horizontal axis as in Figure 15.6. Otherwise, as in Figure 15.7, the procedure numbers the bars from left to right and adds a legend identifying the categories.

Figure 15.7: Pareto Chart with Category Legend

Pareto Chart with Category Legend


A category legend is likely to be introduced when

  • the number of categories is large

  • the category labels are lengthy (as in this example). Category labels can be up to 64 characters.

  • a large text height is used. You can specify the height with the HEIGHT= option in the VBAR statement or with the HTEXT= option in a GOPTIONS statement (not shown here).