30
74

Depth Separation for Neural Networks

Abstract

Let f:Sd1×Sd1Sf:\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}\to\mathbb{S} be a function of the form f(x,x)=g(x,x)f(\mathbf{x},\mathbf{x}') = g(\langle\mathbf{x},\mathbf{x}'\rangle) for g:[1,1]Rg:[-1,1]\to \mathbb{R}. We give a simple proof that shows that poly-size depth two neural networks with (exponentially) bounded weights cannot approximate ff whenever gg cannot be approximated by a low degree polynomial. Moreover, for many gg's, such as g(x)=sin(πd3x)g(x)=\sin(\pi d^3x), the number of neurons must be 2Ω(dlog(d))2^{\Omega\left(d\log(d)\right)}. Furthermore, the result holds w.r.t.\ the uniform distribution on Sd1×Sd1\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}. As many functions of the above form can be well approximated by poly-size depth three networks with poly-bounded weights, this establishes a separation between depth two and depth three networks w.r.t.\ the uniform distribution on Sd1×Sd1\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.