ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.09235
32
16

Generalization and Stability of Interpolating Neural Networks with Minimal Width

18 February 2023
Hossein Taheri
Christos Thrampoulidis
ArXivPDFHTML
Abstract

We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error ϵ\epsilonϵ and their distance from initialization is g(ϵ)g(\epsilon)g(ϵ), we demonstrate that gradient descent with nnn training data achieves training error O(g(1/T)2/T)O(g(1/T)^2 /T)O(g(1/T)2/T) and generalization error O(g(1/T)2/n)O(g(1/T)^2 /n)O(g(1/T)2/n) at iteration TTT, provided there are at least m=Ω(g(1/T)4)m=\Omega(g(1/T)^4)m=Ω(g(1/T)4) hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of O~(1/T)\tilde O(1/ T)O~(1/T) given polylogarithmic number of neurons m=Ω(log⁡4(T))m=\Omega(\log^4 (T))m=Ω(log4(T)). Moreover, with m=Ω(log⁡4(n))m=\Omega(\log^{4} (n))m=Ω(log4(n)) neurons and T≈nT\approx nT≈n iterations, we bound the test loss by O~(1/n)\tilde{O}(1/n)O~(1/n). Our results differ from existing generalization outcomes using the algorithmic-stability framework, which necessitate polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression.

View on arXiv
Comments on this paper