169
88

Optimal Approximation Rates and Metric Entropy of ReLUk^k and Cosine Networks

Abstract

This article addresses several fundamental issues associated with the approximation theory of neural networks, including the characterization of approximation spaces, the determination of the metric entropy of these spaces, and approximation rates of neural networks. For any activation function σ\sigma, we show that the largest Banach space of functions which can be efficiently approximated by the corresponding shallow neural networks is the space whose norm is given by the gauge of the closed convex hull of the set {±σ(ωx+b)}\{\pm\sigma(\omega\cdot x + b)\}. We characterize this space for the ReLUk^k and cosine activation functions and, in particular, show that the resulting gauge space is equivalent to the spectral Barron space if σ=cos\sigma=\cos and is equivalent to the Barron space when σ=ReLU\sigma={\rm ReLU}. Our main result establishes the precise asymptotics of the L2L^2-metric entropy of the unit ball of these guage spaces and, as a consequence, the optimal approximation rates for shallow ReLUk^k networks. The sharpest previous results hold only in the special case that k=0k=0 and d=2d=2, where the metric entropy has been determined up to logarithmic factors. When k>0k > 0 or d>2d > 2, there is a significant gap between the previous best upper and lower bounds. We close all of these gaps and determine the precise asymptotics of the metric entropy for all k0k \geq 0 and d2d\geq 2, including removing the logarithmic factors previously mentioned. Finally, we use these results to quantify how much is lost by Barron's spectral condition relative to the convex hull of {±σ(ωx+b)}\{\pm\sigma(\omega\cdot x + b)\} when σ=ReLUk\sigma={\rm ReLU}^k. Finally, we also show that the orthogonal greedy algorithm can algorithmically realize the improved approximation rates which have been derived.

View on arXiv
Comments on this paper