221

Depth Separation for Neural Networks

Annual Conference Computational Learning Theory (COLT), 2017
Abstract

Let f:Sd1×Sd1Sf:\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}\to\mathbb{S} be a function of the form f(x,x)=g(x,x)f(\mathbf{x},\mathbf{x}') = g(\langle\mathbf{x},\mathbf{x}'\rangle) for g:[1,1]Rg:[-1,1]\to \mathbb{R}. We give a simple proof that shows that poly-size depth two neural networks with (exponentially) bounded weights cannot approximate ff whenever gg cannot be approximated by a low degree polynomial. Moreover, for many gg's, such as g(x)=sin(πd3x)g(x)=\sin(\pi d^3x), the number of neurons must be 2Ω(dlog(d))2^{\Omega\left(d\log(d)\right)}. Furthermore, the result holds w.r.t.\ the uniform distribution on Sd1×Sd1\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}. As many functions of the above form can be well approximated by poly-size depth three networks with poly-bounded weights, this establishes a separation between depth two and depth three networks w.r.t.\ the uniform distribution on Sd1×Sd1\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}.

View on arXiv
Comments on this paper