11
242

Representation Benefits of Deep Feedforward Networks

Abstract

This note provides a family of classification problems, indexed by a positive integer kk, where all shallow networks with fewer than exponentially (in kk) many nodes exhibit error at least 1/61/6, whereas a deep network with 2 nodes in each of 2k2k layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated kk times. The proof is elementary, and the networks are standard feedforward networks with ReLU (Rectified Linear Unit) nonlinearities.

View on arXiv
Comments on this paper