Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.01547
Cited By
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
3 July 2020
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers"
39 / 39 papers shown
Title
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
147
0
0
22 Jan 2025
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Lukas Tatzel
Bálint Mucsányi
Osane Hackel
Philipp Hennig
43
0
0
18 Oct 2024
Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes
Nikita Kiselev
Andrey Grabovoy
54
1
0
18 Sep 2024
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
50
17
0
10 Jul 2024
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
Antonio Orvieto
Lin Xiao
39
2
0
05 Jul 2024
Hard ASH: Sparsity and the right optimizer make a continual learner
Santtu Keskinen
CLL
37
1
0
26 Apr 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
39
26
0
29 Feb 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
27
2
0
17 Jan 2024
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
16
3
0
08 Jun 2023
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
32
10
0
12 May 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters
Jeremy Bernstein
Chris Mingard
Kevin Huang
Navid Azizan
Yisong Yue
ODL
16
17
0
11 Apr 2023
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding
Minh-Long Luu
Zeyi Huang
Eric P. Xing
Yong Jae Lee
Haohan Wang
AAML
34
1
0
09 Dec 2022
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
34
6
0
28 Nov 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
29
60
0
17 Nov 2022
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison
Luke Metz
Jascha Narain Sohl-Dickstein
47
22
0
22 Sep 2022
Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
Severin Reiz
T. Neckel
H. Bungartz
ODL
21
1
0
03 Aug 2022
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
Dennis Ulmer
Christian Hardmeier
J. Frellsen
48
42
0
14 Apr 2022
Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic Choices
Dixian Zhu
Xiaodong Wu
Tianbao Yang
35
10
0
27 Mar 2022
Practical tradeoffs between memory, compute, and performance in learned optimizers
Luke Metz
C. Freeman
James Harrison
Niru Maheswaranathan
Jascha Narain Sohl-Dickstein
33
32
0
22 Mar 2022
Adaptive Gradient Methods with Local Guarantees
Zhou Lu
Wenhan Xia
Sanjeev Arora
Elad Hazan
ODL
24
9
0
02 Mar 2022
DeepCreativity: Measuring Creativity with Deep Learning Techniques
Giorgio Franceschelli
Mirco Musolesi
31
5
0
16 Jan 2022
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
26
4
0
07 Nov 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size
Hideaki Iiduka
15
1
0
26 Aug 2021
Accelerating Federated Learning with a Global Biased Optimiser
Jed Mills
Jia Hu
Geyong Min
Rui Jin
Siwei Zheng
Jin Wang
FedML
AI4CE
34
9
0
20 Aug 2021
Inverse-Dirichlet Weighting Enables Reliable Training of Physics Informed Neural Networks
S. Maddu
D. Sturm
Christian L. Müller
I. Sbalzarini
AI4CE
21
81
0
02 Jul 2021
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODL
AI4CE
27
85
0
25 Jun 2021
AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
Yuchen Jin
Dinesh Manocha
Liangyu Zhao
Yibo Zhu
Chuanxiong Guo
Marco Canini
Arvind Krishnamurthy
37
18
0
22 May 2021
Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu
Jeremy Bernstein
M. Meister
Yisong Yue
ODL
41
26
0
14 Feb 2021
FCM-RDpA: TSK Fuzzy Regression Model Construction Using Fuzzy C-Means Clustering, Regularization, DropRule, and Powerball AdaBelief
Zhenhua Shi
Dongrui Wu
Chenfeng Guo
Changming Zhao
Yuqi Cui
Fei-Yue Wang
23
52
0
30 Nov 2020
A straightforward line search approach on the expected empirical loss for stochastic deep learning problems
Max Mutschler
A. Zell
30
0
0
02 Oct 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
34
79
0
17 Sep 2020
Compositional ADAM: An Adaptive Compositional Solver
Rasul Tutunov
Minne Li
Alexander I. Cowen-Rivers
Jun Wang
Haitham Bou-Ammar
ODL
57
16
0
10 Feb 2020
diffGrad: An Optimization Method for Convolutional Neural Networks
S. Dubey
Soumendu Chakraborty
S. K. Roy
Snehasis Mukherjee
S. Singh
B. B. Chaudhuri
ODL
97
184
0
12 Sep 2019
Quasi-hyperbolic momentum and Adam for deep learning
Jerry Ma
Denis Yarats
ODL
84
129
0
16 Oct 2018
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Mohammad Emtiyaz Khan
Didrik Nielsen
Voot Tangkaratt
Wu Lin
Y. Gal
Akash Srivastava
ODL
74
266
0
13 Jun 2018
L4: Practical loss-based stepsize adaptation for deep learning
Michal Rolínek
Georg Martius
ODL
38
63
0
14 Feb 2018
1