Depth Separations in Neural Networks: What is Actually Being Separated?

Existing depth separation results for constant-depth networks essentially show that certain radial functions in , which can be easily approximated with depth networks, cannot be approximated by depth networks, even up to constant accuracy, unless their size is exponential in . However, the functions used to demonstrate this are rapidly oscillating, with a Lipschitz parameter scaling polynomially with the dimension (or equivalently, by scaling the function, the hardness result applies to -Lipschitz functions only when the target accuracy is at most ). In this paper, we study whether such depth separations might still hold in the natural setting of -Lipschitz radial functions, when does not scale with . Perhaps surprisingly, we show that the answer is negative: In contrast to the intuition suggested by previous work, it \emph{is} possible to approximate -Lipschitz radial functions with depth , size networks, for every constant . We complement it by showing that approximating such functions is also possible with depth , size networks, for every constant . Finally, we show that it is not possible to have polynomial dependence in both simultaneously. Overall, our results indicate that in order to show depth separations for expressing -Lipschitz functions with constant accuracy -- if at all possible -- one would need fundamentally different techniques than existing ones in the literature.
View on arXiv