Minimum discrepancy principle strategy for choosing $k$ in $k$ -NN regression

20 August 2020

Abstract

This paper presents a novel data-driven strategy to choose the hyperparameter $k$ in the $k$ -NN regression estimator. We treat the problem of choosing the hyperparameter as an iterative procedure (over $k$ ) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal, under the fixed-design assumption on covariates, over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. After that, the novel strategy shows consistent simulation results on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method and generalized cross-validation. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size $n$ , if one should choose $k$ among $\left\{ 1, \ldots, n \right\}$ , the strategy reduces the computational time of the generalized cross-validation or Akaike's AIC criteria from $\mathcal{O}\left( n^3 \right)$ to $\mathcal{O}\left( n^2 (n - k) \right)$ , where $k$ is the proposed (minimum discrepancy principle) value of the nearest neighbors.

View on arXiv

Comments on this paper

Minimum discrepancy principle strategy for choosing kkk in kkk-NN regression

Minimum discrepancy principle strategy for choosing $k$ in $k$ -NN regression