LIVEJoin the current RTAI Connect sessionJoin now

78
260

Learning One-hidden-layer Neural Networks with Landscape Design

Abstract

We consider the problem of learning a one-hidden-layer neural network: we assume the input xRdx\in \mathbb{R}^d is from Gaussian distribution and the label y=aσ(Bx)+ξy = a^\top \sigma(Bx) + \xi, where aa is a nonnegative vector in Rm\mathbb{R}^m with mdm\le d, BRm×dB\in \mathbb{R}^{m\times d} is a full-rank weight matrix, and ξ\xi is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function G()G(\cdot) whose landscape is guaranteed to have the following properties: 1. All local minima of GG are also global minima. 2. All global minima of GG correspond to the ground truth parameters. 3. The value and gradient of GG can be estimated using samples. With these properties, stochastic gradient descent on GG provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity result and validate the results by simulations.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.