ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.10826
16
54

On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

26 May 2019
Lili Su
Pengkun Yang
    MLT
ArXivPDFHTML
Abstract

We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate {\em an} integral operator which is determined by the feature vector distribution ρ\rhoρ only. Consequently, GD method can be viewed as {\em approximately} applying the powers of this integral operator on the underlying/target function f∗f^*f∗ that generates the responses/labels. We show that if f∗f^*f∗ admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low-rank approximation error at a linear rate which is determined by f∗f^*f∗ and ρ\rhoρ only, i.e., the rate is independent of the sample size nnn. Furthermore, if f∗f^*f∗ has zero low-rank approximation error, then, as long as the width of the neural network is Ω(nlog⁡n)\Omega(n\log n)Ω(nlogn), the empirical risk decreases to Θ(1/n)\Theta(1/\sqrt{n})Θ(1/n​). To the best of our knowledge, this is the first result showing the sufficiency of nearly-linear network over-parameterization. We provide an application of our general results to the setting where ρ\rhoρ is the uniform distribution on the spheres and f∗f^*f∗ is a polynomial. Throughout this paper, we consider the scenario where the input dimension ddd is fixed.

View on arXiv
Comments on this paper