ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.07770
18
117

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

17 October 2018
Chulhee Yun
S. Sra
Ali Jadbabaie
ArXivPDFHTML
Abstract

We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require NNN hidden nodes to memorize/interpolate arbitrary NNN data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with Ω(N)\Omega(\sqrt{N})Ω(N​) hidden nodes can perfectly memorize most datasets with NNN points. We also prove that width Θ(N)\Theta(\sqrt{N})Θ(N​) is necessary and sufficient for memorizing NNN data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an LLL-layer network with WWW parameters in the hidden layers can memorize NNN data points if W=Ω(N)W = \Omega(N)W=Ω(N). Combined with a recent upper bound O(WLlog⁡W)O(WL\log W)O(WLlogW) on VC dimension, our construction is nearly tight for any fixed LLL. Subsequently, we analyze memorization capacity of residual networks under a general position assumption; we prove results that substantially reduce the known requirement of NNN hidden nodes. Finally, we study the dynamics of stochastic gradient descent (SGD), and show that when initialized near a memorizing global minimum of the empirical risk, SGD quickly finds a nearby point with much smaller empirical risk.

View on arXiv
Comments on this paper