13
21

Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth

Abstract

We prove sharp dimension-free representation results for neural networks with DD ReLU layers under square loss for a class of functions GD\mathcal{G}_D defined in the paper. These results capture the precise benefits of depth in the following sense: 1. The rates for representing the class of functions GD\mathcal{G}_D via DD ReLU layers is sharp up to constants, as shown by matching lower bounds. 2. For each DD, GDGD+1\mathcal{G}_{D} \subseteq \mathcal{G}_{D+1} and as DD grows the class of functions GD\mathcal{G}_{D} contains progressively less smooth functions. 3. If D<DD^{\prime} < D, then the approximation rate for the class GD\mathcal{G}_D achieved by depth DD^{\prime} networks is strictly worse than that achieved by depth DD networks. This constitutes a fine-grained characterization of the representation power of feedforward networks of arbitrary depth DD and number of neurons NN, in contrast to existing representation results which either require DD growing quickly with NN or assume that the function being represented is highly smooth. In the latter case similar rates can be obtained with a single nonlinear layer. Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions, and indeed, the main technical novelty is to fully exploit the fact that deep networks can produce highly oscillatory functions with few activation functions.

View on arXiv
Comments on this paper