Optimal cross-validation in density estimation with the -loss

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave--out CV procedure (Lpo), where denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon -fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with , is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size , optimality is achieved for large enough [with ] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as is conveniently related to the rate of convergence of the best estimator in the collection: (i) as with a parametric rate, and (ii) with some nonparametric estimators. These theoretical results are validated by simulation experiments.
View on arXiv