Truth or Backpropaganda? An Empirical Investigation of Deep Learning
Theory

Abstract
We empirically evaluate common assumptions about neural networks that arewidely held by practitioners and theorists alike. In this work, we: (1) prove thewidespread existence of suboptimal local minima in the loss landscape of neu-ral networks, and we use our theory to find examples; (2) show that small-normparameters are not optimal for generalization; (3) demonstrate that ResNets donot conform to wide-network theories, such as the neural tangent kernel, and thatthe interaction between skip connections and batch normalization plays a role; (4)find that rank does not correlate with generalization or robustness in a practicalsetting.
View on arXivComments on this paper