Understanding Deep Neural Networks with Rectified Linear Units
- PINN

In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give the first-ever polynomial time (in the size of data) algorithm to train a ReLU DNN with one hidden layer to global optimality. This follows from our complete characterization of the ReLU DNN function class whereby we show that a function is representable by a ReLU DNN if and only if it is a continuous piecewise linear function. The main tool used to prove this characterization is an elegant result from tropical geometry. Further, for the case, we show that a single hidden layer suffices to express all piecewise linear functions, and we give tight bounds for the size of such a ReLU DNN.We follow up with gap results showing that there is a smoothly parameterized family of "hard" functions that lead to an exponential blow-up in size, if the number of layers is decreased by a small amount. An example consequence of our gap theorem is that for every natural number , there exists a function representable by a ReLU DNN with depth and total size , such that any ReLU DNN with depth at most will require at least total nodes. Finally, we construct a family of functions for (also smoothly parameterized), whose number of affine pieces scales exponentially with the dimension at any fixed size and depth. To the best of our knowledge, such a construction with exponential dependence on has not been achieved by previous families of "hard" functions in the neural nets literature.
View on arXiv