Symmetry & critical points for a model shallow neural network
Abstract
We consider the optimization problem associated with fitting two-layer ReLU networks with neurons, where labels are assumed to be generated by a target network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as infinite series in . These expressions are then used to derive estimates for several related quantities which imply that \emph{not all spurious minima are alike}. For example, we show that while the loss function at certain types of spurious minima decays to zero as , in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry breaking, bifurcation, and algebraic geometry, notably Artin's implicit function theorem.
View on arXivComments on this paper
