Batch size invariant Adam

29 February 2024

Papers citing "Batch size invariant Adam"

4 / 4 papers shown

Title
How to set AdamW's weight decay as you scale model and dataset size Xi Wang Laurence Aitchison 46 9 0 22 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information Dongseong Hwang ODL 37 4 0 21 May 2024
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms Sadhika Malladi Kaifeng Lyu A. Panigrahi Sanjeev Arora 92 40 0 20 May 2022
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 299 2,890 0 15 Sep 2016