Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory

1 October 2019

Micah Goldblum

Abstract

We empirically evaluate common assumptions about neural networks that arewidely held by practitioners and theorists alike. In this work, we: (1) prove thewidespread existence of suboptimal local minima in the loss landscape of neu-ral networks, and we use our theory to find examples; (2) show that small-normparameters are not optimal for generalization; (3) demonstrate that ResNets donot conform to wide-network theories, such as the neural tangent kernel, and thatthe interaction between skip connections and batch normalization plays a role; (4)find that rank does not correlate with generalization or robustness in a practicalsetting.

View on arXiv

Comments on this paper