ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02285
16
0
v1v2 (latest)

Why Gradients Rapidly Increase Near the End of Training

2 June 2025
Aaron Defazio
ArXiv (abs)PDFHTML
Main:9 Pages
2 Figures
Bibliography:2 Pages
Abstract

During long-duration Large Language Model (LLM) training runs the gradient norm increases rapidly near the end of training. In this short note, we show that this increase is due to an unintended interaction between weight decay, normalization layers, and the learning rate schedule. We propose a simple correction that fixes this behavior while also resulting in lower loss values throughout training.

View on arXiv
@article{defazio2025_2506.02285,
  title={ Why Gradients Rapidly Increase Near the End of Training },
  author={ Aaron Defazio },
  journal={arXiv preprint arXiv:2506.02285},
  year={ 2025 }
}
Comments on this paper