32
375

Two models of double descent for weak features

Abstract

The "double descent" risk curve was proposed to qualitatively describe the out-of-sample prediction accuracy of variably-parameterized machine learning models. This article provides a precise mathematical analysis for the shape of this curve in two simple data models with the least squares/least norm predictor. Specifically, it is shown that the risk peaks when the number of features pp is close to the sample size nn, but also that the risk decreases towards its minimum as pp increases beyond nn. This behavior is contrasted with that of "prescient" models that select features in an a priori optimal order.

View on arXiv
Comments on this paper