Understanding Deep Neural Networks with Rectified Linear Units

4 November 2016

R. Arora

Abstract

In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give the first-ever polynomial time (in the size of data) algorithm to train a ReLU DNN with one hidden layer to global optimality. This follows from our complete characterization of the ReLU DNN function class whereby we show that a $\mathbb{R}^n \to \mathbb{R}$ function is representable by a ReLU DNN if and only if it is a continuous piecewise linear function. The main tool used to prove this characterization is an elegant result from tropical geometry. Further, for the $n=1$ case, we show that a single hidden layer suffices to express all piecewise linear functions, and we give tight bounds for the size of such a ReLU DNN.We follow up with gap results showing that there is a smoothly parameterized family of $\mathbb{R}\to \mathbb{R}$ "hard" functions that lead to an exponential blow-up in size, if the number of layers is decreased by a small amount. An example consequence of our gap theorem is that for every natural number $N$ , there exists a function representable by a ReLU DNN with depth $N^2+1$ and total size $N^3$ , such that any ReLU DNN with depth at most $N + 1$ will require at least $\frac12N^{N+1}-1$ total nodes. Finally, we construct a family of $\mathbb{R}^n\to \mathbb{R}$ functions for $n\geq 2$ (also smoothly parameterized), whose number of affine pieces scales exponentially with the dimension $n$ at any fixed size and depth. To the best of our knowledge, such a construction with exponential dependence on $n$ has not been achieved by previous families of "hard" functions in the neural nets literature.

View on arXiv

Comments on this paper