82
94

Bandwidth selection for multivariate density derivative estimation, with applications to clustering and bump hunting

Abstract

In a recent paper, Chac\'on, Duong and Wand (2011) provided an asymptotic analysis for kernel estimation of multivariate density derivatives of arbitrary order. However, that paper did not address in detail the most important topic for any kernel estimator in practice, that is, the choice of the bandwidth. In the multivariate context there are different levels of sophistication on the bandwidth matrix to be used in the estimator. The simplest parameterization of such a bandwidth, consisting of a positive scalar multiple of the identity matrix, is easier to analyze from a mathematical point of view, but its lack of flexibility can lead to a substantial loss in terms of efficiency, which worsens as the order of the derivative increases, as compared to the most general parameterization, using a symmetric positive definite matrix. Here we present three new methods which allow for an automatic (data-dependent) selection of the bandwidth matrix within the most general class of matrices. We study their theoretical asymptotic properties and their finite sample behaviour and, as an application, show how these new bandwidth selection methods can be combined with the mean shift algorithm to develop new data-driven nonparametric clustering procedures and feature significance for bump hunting.

View on arXiv
Comments on this paper