39
22

On approximating f\nabla f with neural networks

Abstract

Consider a feedforward neural network ψ:RdRd\psi: \mathbb{R}^d\rightarrow \mathbb{R}^d such that ψf\psi\approx \nabla f, where f:RdRf:\mathbb{R}^d \rightarrow \mathbb{R} is a smooth function, therefore ψ\psi must satisfy jψi=iψj\partial_j \psi_i = \partial_i \psi_j pointwise. We prove a theorem that a ψ\psi network with more than one hidden layer can only represent one feature in its first hidden layer; this is a dramatic departure from the well-known results for one hidden layer. The proof of the theorem is straightforward, where two backward paths and a weight-tying matrix play the key roles. We then present the alternative, the implicit parametrization, where the neural network is ϕ:RdR\phi: \mathbb{R}^d \rightarrow \mathbb{R} and ϕf\nabla \phi \approx \nabla f; in addition, a "soft analysis" of ϕ\nabla \phi gives a dual perspective on the theorem. Throughout, we come back to recent probabilistic models that are formulated as ϕf\nabla \phi \approx \nabla f, and conclude with a critique of denoising autoencoders.

View on arXiv
Comments on this paper