Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Interpolators---estimators that achieve zero training error---have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum norm (`ridgeless') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors are obtained by applying a linear transform to a vector of i.i.d. entries, (with ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, (with , a matrix of i.i.d. entries, and an activation function acting componentwise on ). We recover---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the `double descent' behavior of the prediction risk, and the potential benefits of overparametrization.
View on arXiv