Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.18824
Cited By
Batch size invariant Adam
29 February 2024
Xi Wang
Laurence Aitchison
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Batch size invariant Adam"
4 / 4 papers shown
Title
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
46
9
0
22 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Dongseong Hwang
ODL
37
4
0
21 May 2024
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
92
40
0
20 May 2022
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
299
2,890
0
15 Sep 2016
1