Symmetry & critical points for a model shallow neural network

23 March 2020

Abstract

We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ neurons, where labels are assumed to be generated by a target network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as infinite series in $1/\sqrt{k}$ . These expressions are then used to derive estimates for several related quantities which imply that \emph{not all spurious minima are alike}. For example, we show that while the loss function at certain types of spurious minima decays to zero as $O(k^{-1})$ , in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry breaking, bifurcation, and algebraic geometry, notably Artin's implicit function theorem.

View on arXiv

Comments on this paper