554

Minimum discrepancy principle strategy for choosing kk in kk-NN regression

Abstract

This paper presents a novel data-driven strategy to choose the hyperparameter kk in the kk-NN regression estimator. We treat the problem of choosing the hyperparameter as an iterative procedure (over kk) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal, under the fixed-design assumption on covariates, over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. After that, the novel strategy shows consistent simulation results on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method and generalized cross-validation. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size nn, if one should choose kk among {1,,n}\left\{ 1, \ldots, n \right\}, the strategy reduces the computational time of the generalized cross-validation or Akaike's AIC criteria from O(n3)\mathcal{O}\left( n^3 \right) to O(n2(nk))\mathcal{O}\left( n^2 (n - k) \right), where kk is the proposed (minimum discrepancy principle) value of the nearest neighbors.

View on arXiv
Comments on this paper