We explore the energy landscape of a simple neural network. In particular, we expand upon previous work demonstrating that the empirical complexity of fitted neural networks is vastly less than a naive parameter count would suggest and that this implicit regularization is actually beneficial for generalization from fitted models.
View on arXiv