ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.20172
33
0

A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation

26 May 2025
Etienne Boursier
Scott Pesme
Radu-Alexandru Dragomir
ArXiv (abs)PDFHTML
Main:10 Pages
6 Figures
Bibliography:3 Pages
Appendix:17 Pages
Abstract

We study the dynamics of gradient flow with small weight decay on general training losses F:Rd→RF: \mathbb{R}^d \to \mathbb{R}F:Rd→R. Under mild regularity assumptions and assuming convergence of the unregularised gradient flow, we show that the trajectory with weight decay λ\lambdaλ exhibits a two-phase behaviour as λ→0\lambda \to 0λ→0. During the initial fast phase, the trajectory follows the unregularised gradient flow and converges to a manifold of critical points of FFF. Then, at time of order 1/λ1/\lambda1/λ, the trajectory enters a slow drift phase and follows a Riemannian gradient flow minimising the ℓ2\ell_2ℓ2​-norm of the parameters. This purely optimisation-based phenomenon offers a natural explanation for the \textit{grokking} effect observed in deep learning, where the training loss rapidly reaches zero while the test loss plateaus for an extended period before suddenly improving. We argue that this generalisation jump can be attributed to the slow norm reduction induced by weight decay, as explained by our analysis. We validate this mechanism empirically on several synthetic regression tasks.

View on arXiv
@article{boursier2025_2505.20172,
  title={ A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation },
  author={ Etienne Boursier and Scott Pesme and Radu-Alexandru Dragomir },
  journal={arXiv preprint arXiv:2505.20172},
  year={ 2025 }
}
Comments on this paper