The KDE Procedure

Kernel Density Estimates

A weighted univariate kernel density estimate involves a variable X and a weight variable W. Let $(X_{i},W_{i}), \  i=1,2,\ldots ,n$, denote a sample of X and W of size n. The weighted kernel density estimate of $f(x)$, the density of X, is as follows:

\[  \hat{f}(x) = \frac{1}{\sum _{i=1}^{n} W_{i}} \sum _{i=1}^{n} W_{i} \varphi _{h}(x-X_{i})  \]

where h is the bandwidth and

\[  \varphi _{h}(x) = \frac{1}{\sqrt {2\pi }h} \exp \left( -\frac{x^{2}}{2h^{2}} \right)  \]

is the standard normal density rescaled by the bandwidth. If $h\rightarrow 0$ and $nh\rightarrow \infty $, then the optimal bandwidth is

\[  h_\mr {AMISE} = \left[ \frac{1}{2\sqrt {\pi } n \int (f)^{2}} \right]^{1/5}  \]

This optimal value is unknown, and so approximations methods are required. For a derivation and discussion of these results, see Silverman (1986, Chapter 3) and Jones, Marron, and Sheather (1996).

For the bivariate case, let $\mb {X} = (X,Y)$ be a bivariate random element taking values in $R^2$ with joint density function

\[  f(x,y), \  (x,y) \in R^2  \]

and let $\mb {X}_{i} = (X_{i},Y_{i}), \  i = 1,2, \ldots , n$, be a sample of size n drawn from this distribution. The kernel density estimate of $f(x,y)$ based on this sample is

$\displaystyle  \hat{f}(x,y)  $
$\displaystyle  =  $
$\displaystyle  \frac{1}{n} \sum _{i=1}^{n} \varphi _{\Strong{h}}(x-X_{i},y-Y_{i})  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \frac{1}{nh_{X}h_{Y}} \sum _{i=1}^{n}\varphi \left( \frac{x-X_{i}}{h_{X}}, \frac{y-Y_{i}}{h_{Y}} \right)  $

where $(x,y) \in R^2$, $h_{X}>0$ and $h_{Y}>0$ are the bandwidths, and $\varphi _{\mb {h}}(x,y)$ is the rescaled normal density

\[  \varphi _{\mb {h}}(x,y) = \frac{1}{ h_{X}h_{Y}} \varphi \left( \frac{x}{h_{X}}, \frac{y}{h_{Y}} \right)  \]

where $\varphi (x,y)$ is the standard normal density function

\[  \varphi (x,y) = \frac{1}{2\pi } \exp \left( -\frac{x^{2}+y^{2}}{2} \right)  \]

Under mild regularity assumptions about $f(x,y)$, the mean integrated squared error (MISE) of $\hat{f}(x,y)$ is

$\displaystyle  \textrm{MISE}(h_{X},h_{Y})  $
$\displaystyle  =  $
$\displaystyle  \textrm{E}\int (\hat{f}-f)^{2}  $
$\displaystyle  $
$\displaystyle  =  $
$\displaystyle  \frac{1}{4\pi n h_{X} h_{Y}}+ \frac{h_{X}^{4}}{4}\int \left(\frac{\partial ^{2}f}{\partial X^{2}}\right)^{2}dxdy  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  {} + \frac{h_{Y}^{4}}{4}\int \left(\frac{\partial ^{2}f}{\partial Y^{2}}\right)^{2}dxdy + O\left(h_{X}^{4} + h_{Y}^{4} + \frac{1}{nh_{X}h_{Y}}\right)  $

as $h_{X} \rightarrow 0$, $h_{Y} \rightarrow 0$ and $n h_{X} h_{Y} \rightarrow \infty $.

Now set

$\displaystyle  \textrm{AMISE}(h_{X},h_{Y})  $
$\displaystyle  =  $
$\displaystyle  \frac{1}{4\pi n h_{X} h_{Y}} + \frac{h_{X}^{4}}{4}\int \left(\frac{\partial ^{2}f}{\partial X^{2}}\right)^{2}dxdy  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  {} + \frac{h_{Y}^{4}}{4}\int \left(\frac{\partial ^{2}f}{\partial Y^{2}}\right)^{2}dxdy  $

which is the asymptotic mean integrated squared error (AMISE). For fixed n, this has a minimum at $(h_{\mr {AMISE}\_ X}, h_{\mr {AMISE}\_ Y})$ defined as

\[  h_{\mr {AMISE}\_ X} = \left[\frac{\int (\frac{\partial ^{2}f}{\partial X^{2}})^{2}}{4n\pi }\right]^{1/6} \left[\frac{\int (\frac{\partial ^{2}f}{\partial X^{2}})^{2}}{\int (\frac{\partial ^{2}f}{\partial Y^{2}})^{2}}\right]^{2/3}  \]

and

\[  h_{\mr {AMISE}\_ Y} = \left[\frac{\int (\frac{\partial ^{2}f}{\partial Y^{2}})^{2}}{4n\pi }\right]^{1/6} \left[\frac{\int (\frac{\partial ^{2}f}{\partial Y^{2}})^{2}}{\int (\frac{\partial ^{2}f}{\partial X^{2}})^{2}}\right]^{2/3}  \]

These are the optimal asymptotic bandwidths in the sense that they minimize MISE. However, as in the univariate case, these expressions contain the second derivatives of the unknown density ${f}$ being estimated, and so approximations are required. See Wand and Jones (1993) for further details.