The PARETO Procedure

Creating a Pareto Chart from Frequency Data

Note: See Basic Pareto Chart from Frequency Data in the SAS/QC Sample Library.

In some situations, a count (frequency) is available for each category, or you can compress a large data set by creating a frequency variable for the categories before applying the PARETO procedure.

For example, you can use the FREQ procedure to obtain the compressed data set Failure2 from the data set Failure1:

proc freq data=Failure1;
   tables Cause / noprint out=Failure2;
run;

A listing of Failure2 is shown in Figure 15.3.

Figure 15.3: Data Set Failure2, Which Is Created by Using PROC FREQ

Obs Cause COUNT PERCENT
1 Contamination 14 45.1613
2 Corrosion 2 6.4516
3 Doping 1 3.2258
4 Metallization 2 6.4516
5 Miscellaneous 3 9.6774
6 Oxide Defect 8 25.8065
7 Silicon Defect 1 3.2258



The following statements produce a horizontal Pareto chart for the data in Failure2:

title 'Analysis of Integrated Circuit Failures';
proc pareto data=Failure2;
   hbar Cause / freq     = Count
                scale    = count
                last     = 'Miscellaneous'
                nlegend  = 'Total Circuits'
                odstitle = title1
                markers;
run;

The frequency variable Count is specified in the FREQ= option. Specifying SCALE= COUNT requests a frequency scale for the frequency axis (at the top of the chart). Specifying LAST= 'Miscellaneous' causes the Miscellaneous category to be displayed last regardless of its frequency. The NLEGEND= option adds a sample size legend labeled "Total Circuits." Specifying ODSTITLE= TITLE replaces the default graph title with the title that is specified in the TITLE statement. The MARKERS option places markers at the points on the cumulative percentage curve.

The chart is displayed in Figure 15.4.

Figure 15.4: Pareto Chart with Frequency Scale

Pareto Chart with Frequency Scale


Note that in a horizontal Pareto chart categories are listed in decreasing frequency order from top to bottom on the category axis.

There are two sets of tied categories in this example: Corrosion and Metallization each occur twice, and Doping and Silicon Defect each occur once. PROC PARETO displays tied categories alphabetically in order of their formatted values. Thus, Corrosion appears before Metallization, and Doping appears before Silicon Defect in Figure 15.4. This is simply a convention, and no practical significance should be attached to the order in which tied categories are arranged.