Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.00143
Cited By
v1
v2
v3
v4 (latest)
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
29 September 2018
Zhiming Zhou
Qingru Zhang
Guansong Lu
Hongwei Wang
Weinan Zhang
Yong Yu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods"
24 / 24 papers shown
Title
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
Bohan Wang
Jingwen Fu
Huishuai Zhang
Nanning Zheng
Wei Chen
75
19
0
27 Oct 2023
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
97
35
0
19 Apr 2023
FedAgg: Adaptive Federated Learning with Aggregated Gradients
Wenhao Yuan
Xuehe Wang
FedML
157
1
0
28 Mar 2023
Provable Adaptivity of Adam under Non-uniform Smoothness
Bohan Wang
Yushun Zhang
Huishuai Zhang
Qi Meng
Ruoyu Sun
Zhirui Ma
Tie-Yan Liu
Zhimin Luo
Wei Chen
77
26
0
21 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
122
70
0
20 Aug 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
97
21
0
12 Feb 2022
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Juntang Zhuang
Yifan Ding
Tommy M. Tang
Nicha Dvornek
S. Tatikonda
James S. Duncan
ODL
85
4
0
11 Oct 2021
Follow Your Path: a Progressive Method for Knowledge Distillation
Wenxian Shi
Yuxuan Song
Hao Zhou
Bohan Li
Lei Li
62
15
0
20 Jul 2021
A decreasing scaling transition scheme from Adam to SGD
Kun Zeng
Jinlan Liu
Zhixia Jiang
Dongpo Xu
ODL
67
10
0
12 Jun 2021
Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Kushal Chakrabarti
Nikhil Chopra
ODL
AI4CE
83
9
0
31 May 2021
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Congliang Chen
Li Shen
Fangyu Zou
Wei Liu
88
29
0
14 Jan 2021
Adaptive Gradient Method with Resilience and Momentum
Jie Liu
Chen Lin
Chuming Li
Lu Sheng
Ming Sun
Junjie Yan
Wanli Ouyang
ODL
26
0
0
21 Oct 2020
GTAdam: Gradient Tracking with Adaptive Momentum for Distributed Online Optimization
Guido Carnevale
Francesco Farina
Ivano Notarnicola
G. Notarstefano
72
24
0
03 Sep 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
254
169
0
03 Jul 2020
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
Shuai Zheng
Yanghua Peng
Sheng Zha
Mu Li
ODL
75
21
0
24 Jun 2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Wenjie Li
Zhaoyang Zhang
Xinjiang Wang
Ping Luo
ODL
81
28
0
21 Apr 2020
Why are Adaptive Methods Good for Attention Models?
J.N. Zhang
Sai Praneeth Karimireddy
Andreas Veit
Seungyeon Kim
Sashank J. Reddi
Surinder Kumar
S. Sra
125
81
0
06 Dec 2019
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization
Anas Barakat
Pascal Bianchi
84
12
0
18 Nov 2019
Does Adam optimizer keep close to the optimal point?
Kiwook Bae
Heechang Ryu
Hayong Shin
ODL
34
18
0
01 Nov 2019
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
57
48
0
27 Oct 2019
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
162
259
0
11 Oct 2019
Why gradient clipping accelerates training: A theoretical justification for adaptivity
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
95
471
0
28 May 2019
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
92
373
0
23 Nov 2018
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate
Haiwen Huang
Changzhang Wang
Bin Dong
ODL
57
59
0
19 May 2018
1