ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.01547
  4. Cited By
Descending through a Crowded Valley - Benchmarking Deep Learning
  Optimizers

Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers

3 July 2020
Robin M. Schmidt
Frank Schneider
Philipp Hennig
    ODL
ArXivPDFHTML

Papers citing "Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers"

39 / 39 papers shown
Title
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
Learning Versatile Optimizers on a Compute Diet
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
147
0
0
22 Jan 2025
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Lukas Tatzel
Bálint Mucsányi
Osane Hackel
Philipp Hennig
43
0
0
18 Oct 2024
Unraveling the Hessian: A Key to Smooth Convergence in Loss Function
  Landscapes
Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes
Nikita Kiselev
Andrey Grabovoy
54
1
0
18 Sep 2024
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
50
17
0
10 Jul 2024
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton
  Stepsizes
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
Antonio Orvieto
Lin Xiao
39
2
0
05 Jul 2024
Hard ASH: Sparsity and the right optimizer make a continual learner
Hard ASH: Sparsity and the right optimizer make a continual learner
Santtu Keskinen
CLL
37
1
0
26 Apr 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent
  on Language Models
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
39
26
0
29 Feb 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
27
2
0
17 Jan 2024
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent:
  Implications for Weight Variances
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
16
3
0
08 Jun 2023
MoMo: Momentum Models for Adaptive Learning Rates
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
32
10
0
12 May 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters
Automatic Gradient Descent: Deep Learning without Hyperparameters
Jeremy Bernstein
Chris Mingard
Kevin Huang
Navid Azizan
Yisong Yue
ODL
16
17
0
11 Apr 2023
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding
Minh-Long Luu
Zeyi Huang
Eric P. Xing
Yong Jae Lee
Haohan Wang
AAML
34
1
0
09 Dec 2022
A survey of deep learning optimizers -- first and second order methods
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
34
6
0
28 Nov 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
29
60
0
17 Nov 2022
A Closer Look at Learned Optimization: Stability, Robustness, and
  Inductive Biases
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison
Luke Metz
Jascha Narain Sohl-Dickstein
47
22
0
22 Sep 2022
Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
Severin Reiz
T. Neckel
H. Bungartz
ODL
21
1
0
03 Aug 2022
deep-significance - Easy and Meaningful Statistical Significance Testing
  in the Age of Neural Networks
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
Dennis Ulmer
Christian Hardmeier
J. Frellsen
48
42
0
14 Apr 2022
Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic
  Choices
Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic Choices
Dixian Zhu
Xiaodong Wu
Tianbao Yang
35
10
0
27 Mar 2022
Practical tradeoffs between memory, compute, and performance in learned
  optimizers
Practical tradeoffs between memory, compute, and performance in learned optimizers
Luke Metz
C. Freeman
James Harrison
Niru Maheswaranathan
Jascha Narain Sohl-Dickstein
33
32
0
22 Mar 2022
Adaptive Gradient Methods with Local Guarantees
Adaptive Gradient Methods with Local Guarantees
Zhou Lu
Wenhan Xia
Sanjeev Arora
Elad Hazan
ODL
24
9
0
02 Mar 2022
DeepCreativity: Measuring Creativity with Deep Learning Techniques
DeepCreativity: Measuring Creativity with Deep Learning Techniques
Giorgio Franceschelli
Mirco Musolesi
31
5
0
16 Jan 2022
Exponential escape efficiency of SGD from sharp minima in non-stationary
  regime
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
26
4
0
07 Nov 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
The Number of Steps Needed for Nonconvex Optimization of a Deep Learning
  Optimizer is a Rational Function of Batch Size
The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size
Hideaki Iiduka
15
1
0
26 Aug 2021
Accelerating Federated Learning with a Global Biased Optimiser
Accelerating Federated Learning with a Global Biased Optimiser
Jed Mills
Jia Hu
Geyong Min
Rui Jin
Siwei Zheng
Jin Wang
FedML
AI4CE
34
9
0
20 Aug 2021
Inverse-Dirichlet Weighting Enables Reliable Training of Physics
  Informed Neural Networks
Inverse-Dirichlet Weighting Enables Reliable Training of Physics Informed Neural Networks
S. Maddu
D. Sturm
Christian L. Müller
I. Sbalzarini
AI4CE
21
81
0
02 Jul 2021
Ranger21: a synergistic deep learning optimizer
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODL
AI4CE
27
85
0
25 Jun 2021
AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on
  the Fly
AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
Yuchen Jin
Dinesh Manocha
Liangyu Zhao
Yibo Zhu
Chuanxiong Guo
Marco Canini
Arvind Krishnamurthy
37
18
0
22 May 2021
Learning by Turning: Neural Architecture Aware Optimisation
Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu
Jeremy Bernstein
M. Meister
Yisong Yue
ODL
41
26
0
14 Feb 2021
FCM-RDpA: TSK Fuzzy Regression Model Construction Using Fuzzy C-Means
  Clustering, Regularization, DropRule, and Powerball AdaBelief
FCM-RDpA: TSK Fuzzy Regression Model Construction Using Fuzzy C-Means Clustering, Regularization, DropRule, and Powerball AdaBelief
Zhenhua Shi
Dongrui Wu
Chenfeng Guo
Changming Zhao
Yuqi Cui
Fei-Yue Wang
23
52
0
30 Nov 2020
A straightforward line search approach on the expected empirical loss
  for stochastic deep learning problems
A straightforward line search approach on the expected empirical loss for stochastic deep learning problems
Max Mutschler
A. Zell
30
0
0
02 Oct 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
34
79
0
17 Sep 2020
Compositional ADAM: An Adaptive Compositional Solver
Compositional ADAM: An Adaptive Compositional Solver
Rasul Tutunov
Minne Li
Alexander I. Cowen-Rivers
Jun Wang
Haitham Bou-Ammar
ODL
57
16
0
10 Feb 2020
diffGrad: An Optimization Method for Convolutional Neural Networks
diffGrad: An Optimization Method for Convolutional Neural Networks
S. Dubey
Soumendu Chakraborty
S. K. Roy
Snehasis Mukherjee
S. Singh
B. B. Chaudhuri
ODL
97
184
0
12 Sep 2019
Quasi-hyperbolic momentum and Adam for deep learning
Quasi-hyperbolic momentum and Adam for deep learning
Jerry Ma
Denis Yarats
ODL
84
129
0
16 Oct 2018
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Mohammad Emtiyaz Khan
Didrik Nielsen
Voot Tangkaratt
Wu Lin
Y. Gal
Akash Srivastava
ODL
74
266
0
13 Jun 2018
L4: Practical loss-based stepsize adaptation for deep learning
L4: Practical loss-based stepsize adaptation for deep learning
Michal Rolínek
Georg Martius
ODL
38
63
0
14 Feb 2018
1