128
34

Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory

Abstract

We empirically evaluate common assumptions about neural networks that arewidely held by practitioners and theorists alike. In this work, we: (1) prove thewidespread existence of suboptimal local minima in the loss landscape of neu-ral networks, and we use our theory to find examples; (2) show that small-normparameters are not optimal for generalization; (3) demonstrate that ResNets donot conform to wide-network theories, such as the neural tangent kernel, and thatthe interaction between skip connections and batch normalization plays a role; (4)find that rank does not correlate with generalization or robustness in a practicalsetting.

View on arXiv
Comments on this paper