Understanding the Generalization Benefits of Late Learning Rate Decay

Understanding the Generalization Benefits of Late Learning Rate Decay

21 January 2024

Chao Ma

Papers citing "Understanding the Generalization Benefits of Late Learning Rate Decay"

6 / 6 papers shown

Title
Grokking phase transitions in learning local rules with gradient descent Bojan Žunkovič E. Ilievski 63 16 0 26 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example Xingyu Zhu Zixuan Wang Xiang Wang Mo Zhou Rong Ge 64 35 0 07 Oct 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning Sanjeev Arora Zhiyuan Li A. Panigrahi MLT 80 89 0 19 May 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 88 98 0 13 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 55 40 0 07 Oct 2021
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 281 2,888 0 15 Sep 2016