Note: See Superimposing Kernel Density Estimates in the SAS/QC Sample Library.
This example illustrates the use of kernel density estimates to visualize a nonnormal data distribution.
The effective channel length (in microns) is measured for 1225 field effect transistors. The channel lengths are saved as
values of the variable Length
in a SAS data set named Channel
:
data Channel; length Lot $ 16; input Length @@; select; when (_n_ <= 425) Lot='Lot 1'; when (_n_ >= 926) Lot='Lot 3'; otherwise Lot='Lot 2'; end; datalines; 0.91 1.01 0.95 1.13 1.12 0.86 0.96 1.17 1.36 1.10 0.98 1.27 1.13 0.92 1.15 1.26 1.14 0.88 1.03 1.00 0.98 0.94 1.09 0.92 1.10 0.95 1.05 1.05 1.11 1.15 1.11 0.98 0.78 1.09 0.94 1.05 0.89 1.16 0.88 1.19 1.01 1.08 1.19 0.94 0.92 1.27 0.90 0.88 1.38 1.02 ... more lines ... 2.13 2.05 1.90 2.07 2.15 1.96 2.15 1.89 2.15 2.04 1.95 1.93 2.22 1.74 1.91 ;
When you use kernel density estimates to explore a data distribution, you should try several choices for the bandwidth parameter c because this determines the smoothness and closeness of the fit. You can specify a list of C= values with the KERNEL option to request multiple density estimates, as shown in the following statements:
title "FET Channel Length Analysis"; proc capability data=Channel noprint; histogram Length / kernel(c = 0.25 0.50 0.75 1.00) odstitle = title; run;
The display, shown in Output 5.12.1, demonstrates the effect of c. In general, larger values of c yield smoother density estimates, and smaller values yield estimates that more closely fit the data distribution.
Output 5.12.1: Multiple Kernel Density Estimates
Output 5.12.1 reveals strong trimodality in the data, which are explored further in Creating a One-Way Comparative Histogram.