Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.00213
Cited By
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
31 January 2025
Akiyoshi Tomihari
Issei Sato
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers"
2 / 52 papers shown
Title
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
James Martens
Roger C. Grosse
ODL
95
1,009
0
19 Mar 2015
Fine-Grained Visual Classification of Aircraft
Subhransu Maji
Esa Rahtu
Arno Solin
Matthew Blaschko
Andrea Vedaldi
107
2,252
0
21 Jun 2013
Previous
1
2