Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.00143
Cited By
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
29 September 2018
Zhiming Zhou
Qingru Zhang
Guansong Lu
Hongwei Wang
Weinan Zhang
Yong Yu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods"
35 / 35 papers shown
Title
On the Convergence of Adam-Type Algorithm for Bilevel Optimization under Unbounded Smoothness
Xiaochuan Gong
Jie Hao
Mingrui Liu
58
0
0
05 Mar 2025
A survey of synthetic data augmentation methods in computer vision
A. Mumuni
F. Mumuni
N. K. Gerrar
42
19
0
15 Mar 2024
Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise
Kwangjun Ahn
Zhiyu Zhang
Yunbum Kook
Yan Dai
45
11
0
02 Feb 2024
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
Bohan Wang
Jingwen Fu
Huishuai Zhang
Nanning Zheng
Wei Chen
18
17
0
27 Oct 2023
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
Zhao Song
Chiwun Yang
36
9
0
17 Oct 2023
A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings
Alokendu Mazumder
Rishabh Sabharwal
Manan Tayal
Bhartendu Kumar
Punit Rathore
22
0
0
15 Sep 2023
Convergence of Adam Under Relaxed Assumptions
Haochuan Li
Alexander Rakhlin
Ali Jadbabaie
37
55
0
27 Apr 2023
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
33
30
0
19 Apr 2023
FedAgg: Adaptive Federated Learning with Aggregated Gradients
Wenhao Yuan
Xuehe Wang
FedML
48
0
0
28 Mar 2023
Provable Adaptivity of Adam under Non-uniform Smoothness
Bohan Wang
Yushun Zhang
Huishuai Zhang
Qi Meng
Ruoyu Sun
Zhirui Ma
Tie-Yan Liu
Zhimin Luo
Wei Chen
30
25
0
21 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
18
63
0
20 Aug 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
26
20
0
12 Feb 2022
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Juntang Zhuang
Yifan Ding
Tommy M. Tang
Nicha Dvornek
S. Tatikonda
James S. Duncan
ODL
29
4
0
11 Oct 2021
Follow Your Path: a Progressive Method for Knowledge Distillation
Wenxian Shi
Yuxuan Song
Hao Zhou
Bohan Li
Lei Li
17
15
0
20 Jul 2021
A decreasing scaling transition scheme from Adam to SGD
Kun Zeng
Jinlan Liu
Zhixia Jiang
Dongpo Xu
ODL
15
10
0
12 Jun 2021
Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Kushal Chakrabarti
Nikhil Chopra
ODL
AI4CE
40
9
0
31 May 2021
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Congliang Chen
Li Shen
Fangyu Zou
Wei Liu
46
29
0
14 Jan 2021
Adaptive Gradient Method with Resilience and Momentum
Jie Liu
Chen Lin
Chuming Li
Lu Sheng
Ming Sun
Junjie Yan
Wanli Ouyang
ODL
16
0
0
21 Oct 2020
GTAdam: Gradient Tracking with Adaptive Momentum for Distributed Online Optimization
Guido Carnevale
Francesco Farina
Ivano Notarnicola
G. Notarstefano
6
21
0
03 Sep 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
40
162
0
03 Jul 2020
AdaSGD: Bridging the gap between SGD and Adam
Jiaxuan Wang
Jenna Wiens
22
10
0
30 Jun 2020
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
Shuai Zheng
Yanghua Peng
Sheng Zha
Mu Li
ODL
23
21
0
24 Jun 2020
Quantized Adam with Error Feedback
Congliang Chen
Li Shen
Haozhi Huang
Wei Liu
ODL
MQ
8
33
0
29 Apr 2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Wenjie Li
Zhaoyang Zhang
Xinjiang Wang
Ping Luo
ODL
20
28
0
21 Apr 2020
Why are Adaptive Methods Good for Attention Models?
J.N. Zhang
Sai Praneeth Karimireddy
Andreas Veit
Seungyeon Kim
Sashank J. Reddi
Surinder Kumar
S. Sra
18
79
0
06 Dec 2019
Domain-independent Dominance of Adaptive Methods
Pedro H. P. Savarese
David A. McAllester
Sudarshan Babu
Michael Maire
ODL
18
22
0
04 Dec 2019
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization
Anas Barakat
Pascal Bianchi
15
12
0
18 Nov 2019
Does Adam optimizer keep close to the optimal point?
Kiwook Bae
Heechang Ryu
Hayong Shin
ODL
14
17
0
01 Nov 2019
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
19
46
0
27 Oct 2019
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
14
256
0
11 Oct 2019
Why gradient clipping accelerates training: A theoretical justification for adaptivity
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
30
445
0
28 May 2019
Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning
Shuai Zheng
James T. Kwok
ODL
19
5
0
23 May 2019
Towards Efficient and Unbiased Implementation of Lipschitz Continuity in GANs
Zhiming Zhou
Jian Shen
Yuxuan Song
Weinan Zhang
Yong Yu
26
6
0
02 Apr 2019
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
33
364
0
23 Nov 2018
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate
Haiwen Huang
Changzhang Wang
Bin Dong
ODL
11
59
0
19 May 2018
1