Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.11075
Cited By
Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
26 January 2021
Aaron Defazio
Samy Jelassi
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization"
27 / 27 papers shown
Title
Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens
Samuele Bortolotti
Emanuele Marconato
Paolo Morettin
Andrea Passerini
Stefano Teso
70
3
0
16 Feb 2025
Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Samy Jelassi
Aaron Defazio
48
5
0
20 Oct 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
61
164
0
03 Jul 2020
Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball
Othmane Sebbouh
Robert Mansel Gower
Aaron Defazio
29
22
0
14 Jun 2020
The Power of Factorial Powers: New Parameter settings for (Stochastic) Optimization
Aaron Defazio
Robert Mansel Gower
9
6
0
01 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
327
41,106
0
28 May 2020
End-to-End Variational Networks for Accelerated MRI Reconstruction
Anuroop Sriram
Jure Zbontar
Tullie Murrell
Aaron Defazio
C. L. Zitnick
N. Yakubova
Florian Knoll
Patricia M. Johnson
DRL
20
314
0
14 Apr 2020
A Simple Convergence Proof of Adam and Adagrad
Alexandre Défossez
Léon Bottou
Francis R. Bach
Nicolas Usunier
87
150
0
05 Mar 2020
Offset Sampling Improves Deep Learning based Accelerated MRI Reconstructions by Exploiting Symmetry
Aaron Defazio
16
8
0
02 Dec 2019
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
51
259
0
11 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
319
24,160
0
26 Jul 2019
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
25
2,482
0
19 Apr 2019
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
43
368
0
23 Nov 2018
fastMRI: An Open Dataset and Benchmarks for Accelerated MRI
Jure Zbontar
Florian Knoll
Anuroop Sriram
Tullie Murrell
Zhengnan Huang
...
Erich Owens
C. L. Zitnick
M. Recht
D. Sodickson
Yvonne W. Lui
OOD
22
836
0
21 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
605
93,936
0
11 Oct 2018
Online Adaptive Methods, Universality and Acceleration
Kfir Y. Levy
A. Yurtsever
Volkan Cevher
ODL
48
89
0
08 Sep 2018
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
25
150
0
16 Aug 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
Rachel A. Ward
Xiaoxia Wu
Léon Bottou
ODL
45
365
0
05 Jun 2018
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Xiaoyun Li
Francesco Orabona
49
294
0
21 May 2018
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
ODL
60
1,026
0
13 Feb 2018
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
39
1,023
0
23 May 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Lukas Balles
Philipp Hennig
58
163
0
22 May 2017
Sequence-to-Sequence Learning as Beam-Search Optimization
Sam Wiseman
Alexander M. Rush
89
591
0
09 Jun 2016
Identity Mappings in Deep Residual Networks
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
213
10,149
0
16 Mar 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
836
192,638
0
10 Dec 2015
Scale-Free Algorithms for Online Linear Optimization
Francesco Orabona
D. Pál
ODL
38
52
0
19 Feb 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
264
149,474
0
22 Dec 2014
1