Expressiveness of Rectifier Networks
- OffRL

Rectified Linear Units (ReLUs) have been shown to ameliorate the vanishing gradient problem, allow for efficient back-propagation, and empirically promote sparsity in the learned parameters. Their use has led to state-of-the-art results in a variety of applications. In this paper, we characterize the expressiveness of ReLU networks. From this perspective, unlike the sign (threshold) and sigmoid activations, ReLU networks are less explored. We show that, while the decision boundary of a two-layer ReLU network can be captured by a sign network, the sign network can require an exponentially larger number of hidden units. Furthermore, we formulate the sufficient conditions for a corresponding logarithmic reduction in the number of hidden units to represent a sign network as a ReLU network. Finally, using synthetic data, we experimentally demonstrate that back propagation can recover the much smaller ReLU networks as predicted by the theory.
View on arXiv