| Kernel Density Estimation |
| See Also |
Kernel density estimation is a useful tool for exploring the unknown underlying distribution of a sample. The kernel method constructs an estimate fh(t) of the true density function by placing a kernel function K(t;xi,h) over each observation xi in the sample. The kernel function K(t;x,h) is itself a density function with location parameter x and scale parameter h, also called bandwidth in this context. The density estimate is then given by
fh(t) = the sum of (K(t-xi)/h)/(nh) from i = 1...nwhere n denotes the sample size. The choice of kernel function K is not very critical for the resulting estimate fh(t) and so a Gaussian kernel is used.
The following graph showing the sum of the normal kernels at 5 data points illustrates the ideas behind the kernel density estimation.
For automatic use of kernel density estimation, estimation of the bandwidth h from the data is very helpful. The following automatic data driven estimates are available (n = the number of observations in the selected variate):
| Sheather & Jones | The method of Sheather & Jones (1991). Jones, Marron & Sheather (1996) recommend this for general purposes |
| Standard Deviation | s1 = 1.06 * (standard deviation) * n**(-1/5) |
| Interquartile Range | s2 = 0.79 * (inter quartile range) * n**(-1/5) |
| Min(Std Dev,IQ Range) | s3 = 0.90 * minimum(standard deviation, interquartile range/1.34) * n**(-1/5) |
| Given | You provide your own estimate for the bandwidth in the associated field along side the drop down list |