ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03175
20
336

Recovery Guarantees for One-hidden-layer Neural Networks

10 June 2017
Kai Zhong
Zhao-quan Song
Prateek Jain
Peter L. Bartlett
Inderjit S. Dhillon
    MLT
ArXivPDFHTML
Abstract

In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to local strong convexity\mathit{local~strong~convexity}local strong convexity in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective. Most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (ReLUs), leaky ReLUs, squared ReLUs and sigmoids. For activation functions that are also smooth, we show local linear convergence\mathit{local~linear~convergence}local linear convergence guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity d⋅log⁡(1/ϵ)⋅poly(k,λ) d \cdot \log(1/\epsilon) \cdot \mathrm{poly}(k,\lambda )d⋅log(1/ϵ)⋅poly(k,λ) and computational complexity n⋅d⋅poly(k,λ)n\cdot d \cdot \mathrm{poly}(k,\lambda) n⋅d⋅poly(k,λ) for smooth homogeneous activations with high probability, where ddd is the dimension of the input, kkk (k≤dk\leq dk≤d) is the number of hidden nodes, λ\lambdaλ is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer, ϵ\epsilonϵ is the targeted precision and nnn is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity linear\mathit{linear}linear in the input dimension and logarithmic\mathit{logarithmic}logarithmic in the precision.

View on arXiv
Comments on this paper