637

Symmetry & critical points for a model shallow neural network

Abstract

We consider the optimization problem associated with fitting two-layer ReLU networks with kk neurons, where labels are assumed to be generated by a target network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as infinite series in 1/k1/\sqrt{k}. These expressions are then used to derive estimates for several related quantities which imply that \emph{not all spurious minima are alike}. For example, we show that while the loss function at certain types of spurious minima decays to zero as O(k1)O(k^{-1}), in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry breaking, bifurcation, and algebraic geometry, notably Artin's implicit function theorem.

View on arXiv
Comments on this paper