38
75

The loss landscape of overparameterized neural networks

Abstract

We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from Rn\mathbb{R}^n to R\mathbb{R} - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has nn parameters and is trained on dd data points, with n>dn>d, we show that the locus MM of global minima of LL is usually not discrete, but rather an ndn-d dimensional submanifold of Rn\mathbb{R}^n. In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that MM is typically a very high-dimensional subset of Rn\mathbb{R}^n.

View on arXiv
Comments on this paper