Catapults in SGD: spikes in the training loss and their impact on
generalization through feature learning

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

7 June 2023

Adityanarayanan Radhakrishnan

Papers citing "Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning"

13 / 13 papers shown

Title
Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent Guillaume Braun Minh Ha Quang Masaaki Imaizumi MLT 42 0 0 31 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 57 4 0 17 Feb 2025
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks Clémentine Dominé Nicolas Anguita A. Proca Lukas Braun D. Kunin P. Mediano Andrew M. Saxe 40 3 0 22 Sep 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 73 5 1 25 May 2024
Linear Recursive Feature Machines provably recover low-rank matrices Adityanarayanan Radhakrishnan Misha Belkin Dmitriy Drusvyatskiy 58 8 0 09 Jan 2024
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression Xuxing Chen Krishnakumar Balasubramanian Promit Ghosal Bhavya Agrawalla 38 7 0 02 Oct 2023
Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture Libin Zhu Chaoyue Liu M. Belkin GNN AI4CE 23 4 0 24 May 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning Sanjeev Arora Zhiyuan Li A. Panigrahi MLT 83 91 0 19 May 2022
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 57 40 0 07 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 235 0 04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 310 2,892 0 15 Sep 2016
Densely Connected Convolutional Networks Gao Huang Zhuang Liu L. V. D. van der Maaten Kilian Q. Weinberger PINN 3DV 321 36,420 0 25 Aug 2016