Bandwidth selection in kernel empirical risk minimization via the gradient

In this paper, we deal with the data-driven selection of multidimensional and (possibly) anisotropic bandwidths in the general problem of kernel empirical risk minimization. We propose a universal selection rule, which leads to optimal adaptive results in a large variety of statistical models such as nonparametric regression or statistical learning with errors-in-variables. These results are stated in the context of smooth loss functions, where the gradient of the risk appears as a good criterion to measure the performance of our estimators. This turns out to be helpful to derive excess risk bounds - with fast rates of convergence - in noisy clustering as well as adaptive minimax results for pointwise and global estimation in robust nonparametric regression. The selection rule consists of a comparison of the gradient empirical risks. It can be viewed as a non-trivial improvement of the so-called GL method (see Goldenshluger and Lepski [16]) to non-linear estimators. Another main advantage of our selection rule is the non-dependency on the smallest eigenvalue of the Hessian matrix of the risk, which is a changing and unknown parameter determined by the underlying model. Keywords and phrases: Adaptivity, Bandwidth Selection, ERM, Errors-in-variables, Robust Regression, Statistical Learning
View on arXiv