The loss landscape of overparameterized neural networks

We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from to - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has parameters and is trained on data points, with , we show that the locus of global minima of is usually not discrete, but rather an dimensional submanifold of . In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that is typically a very high-dimensional subset of .
View on arXiv