Distribution Analyses |

*Kernel density estimation* provides normal, triangular, and quadratic kernel density estimators. The general form of a kernel estimator is

Some symmetric probability density functions commonly used as kernel functions are

for | ||

Both theory and practice suggest that the choice of a kernel function is not crucial to the statistical performance of the method (Epanechnikov 1969). With a specific kernel function, the value of determines the degree of averaging in the estimate of the density function and is called a *smoothing parameter*. You select a bandwidth for each kernel estimator by specifying *c* in the formula

where *Q* is the sample interquartile range of the **Y** variable. This formulation makes *c* independent of the units of **Y**.

For a specific kernel function, the discrepancy between the density estimator and the true density *f*(*y*) can be measured by the mean integrated square error

which is the sum of the integrated square bias and the integrated variance.

An approximate mean integrated square error based on the bandwidth is

If *f*(*y*) is assumed normal, then a bandwidth based on the sample mean and variance can be computed to minimize AMISE. The resulting bandwidth for a specific kernel is used when the associated kernel function is selected in the density estimation options dialog. This is equivalent to choosing **MISE** from the normal, triangular, or quadratic kernel menus. If *f*(*y*) is not roughly normal, this choice may not be appropriate.

SAS/INSIGHT software divides the range of the data into 128 evenly spaced intervals, then approximates the data on this grid and uses the fast Fourier transformation (Silverman 1986) to estimate the density.

If you select a **Weight** variable, the kernel estimator is modified to include the individual observation weights.

You can specify the kernel function in the density estimation options dialog or from the **Curves** menu. When you specify the kernel function in the density estimation options dialog, **AMISE** is used. After choosing **Curves:Kernel Density** from the menu, you can specify the kernel function and use either **AMISE** or a specified C value in the **Kernel Density Estimation** dialog.

**Figure 38.25:** Kernel Density Dialog

The default uses a normal kernel density with a *c* value that minimizes the AMISE. Figure 38.26 displays normal kernel estimates with *c* = 0.7852 (the AMISE value) and *c* = 0.25. Small values of *c* (and hence small values of the smoothing parameter ) provide jagged estimates as the curve more closely follows the data points. Large values of *c* provide smoother estimates. The **Mode** is the point with the largest estimated density. Use the slider to change the smoothing parameter, *c*.

**Figure 38.26:** Kernel Density Estimation

Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.