PROC KDE: Kernel Density Estimates :: SAS/STAT(R) 9.3 User's Guide

Kernel Density Estimates

A weighted univariate kernel density estimate involves a variable $\text{[math]}$ and a weight variable $\text{[math]}$ . Let $\text{[math]}$ , denote a sample of $\text{[math]}$ and $\text{[math]}$ of size $\text{[math]}$ . The weighted kernel density estimate of $\text{[math]}$ , the density of $\text{[math]}$ , is as follows:

$\text{[math]}$

where $\text{[math]}$ is the bandwidth and

$\text{[math]}$

is the standard normal density rescaled by the bandwidth. If $\text{[math]}$ and $\text{[math]}$ , then the optimal bandwidth is

$\text{[math]}$

This optimal value is unknown, and so approximations methods are required. For a derivation and discussion of these results, refer to Silverman (1986, Chapter 3) and Jones, Marron, and Sheather (1996).

For the bivariate case, let $\text{[math]}$ be a bivariate random element taking values in $\text{[math]}$ with joint density function

$\text{[math]}$

and let $\text{[math]}$ , be a sample of size $\text{[math]}$ drawn from this distribution. The kernel density estimate of $\text{[math]}$ based on this sample is

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

where $\text{[math]}$ , $\text{[math]}$ and $\text{[math]}$ are the bandwidths, and $\text{[math]}$ is the rescaled normal density

$\text{[math]}$

where $\text{[math]}$ is the standard normal density function

$\text{[math]}$

Under mild regularity assumptions about $\text{[math]}$ , the mean integrated squared error (MISE) of $\text{[math]}$ is

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

as $\text{[math]}$ , $\text{[math]}$ and $\text{[math]}$ .

Now set

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

which is the asymptotic mean integrated squared error (AMISE). For fixed $\text{[math]}$ , this has a minimum at $\text{[math]}$ defined as

$\text{[math]}$

and

$\text{[math]}$

These are the optimal asymptotic bandwidths in the sense that they minimize MISE. However, as in the univariate case, these expressions contain the second derivatives of the unknown density $\text{[math]}$ being estimated, and so approximations are required. Refer to Wand and Jones (1993) for further details.

The KDE Procedure