ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.00143
  4. Cited By
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate
  Methods
v1v2v3v4 (latest)

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods

29 September 2018
Zhiming Zhou
Qingru Zhang
Guansong Lu
Hongwei Wang
Weinan Zhang
Yong Yu
ArXiv (abs)PDFHTML

Papers citing "AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods"

24 / 24 papers shown
Title
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's
  Iteration Complexity
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
Bohan Wang
Jingwen Fu
Huishuai Zhang
Nanning Zheng
Wei Chen
75
19
0
27 Oct 2023
A Theory on Adam Instability in Large-Scale Machine Learning
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
97
35
0
19 Apr 2023
FedAgg: Adaptive Federated Learning with Aggregated Gradients
FedAgg: Adaptive Federated Learning with Aggregated Gradients
Wenhao Yuan
Xuehe Wang
FedML
157
1
0
28 Mar 2023
Provable Adaptivity of Adam under Non-uniform Smoothness
Provable Adaptivity of Adam under Non-uniform Smoothness
Bohan Wang
Yushun Zhang
Huishuai Zhang
Qi Meng
Ruoyu Sun
Zhirui Ma
Tie-Yan Liu
Zhimin Luo
Wei Chen
77
26
0
21 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
122
70
0
20 Aug 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1
  Adam
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRLAI4CE
97
21
0
12 Feb 2022
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Juntang Zhuang
Yifan Ding
Tommy M. Tang
Nicha Dvornek
S. Tatikonda
James S. Duncan
ODL
85
4
0
11 Oct 2021
Follow Your Path: a Progressive Method for Knowledge Distillation
Follow Your Path: a Progressive Method for Knowledge Distillation
Wenxian Shi
Yuxuan Song
Hao Zhou
Bohan Li
Lei Li
62
15
0
20 Jul 2021
A decreasing scaling transition scheme from Adam to SGD
A decreasing scaling transition scheme from Adam to SGD
Kun Zeng
Jinlan Liu
Zhixia Jiang
Dongpo Xu
ODL
67
10
0
12 Jun 2021
Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Kushal Chakrabarti
Nikhil Chopra
ODLAI4CE
83
9
0
31 May 2021
Towards Practical Adam: Non-Convexity, Convergence Theory, and
  Mini-Batch Acceleration
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Congliang Chen
Li Shen
Fangyu Zou
Wei Liu
88
29
0
14 Jan 2021
Adaptive Gradient Method with Resilience and Momentum
Adaptive Gradient Method with Resilience and Momentum
Jie Liu
Chen Lin
Chuming Li
Lu Sheng
Ming Sun
Junjie Yan
Wanli Ouyang
ODL
26
0
0
21 Oct 2020
GTAdam: Gradient Tracking with Adaptive Momentum for Distributed Online
  Optimization
GTAdam: Gradient Tracking with Adaptive Momentum for Distributed Online Optimization
Guido Carnevale
Francesco Farina
Ivano Notarnicola
G. Notarstefano
72
24
0
03 Sep 2020
Descending through a Crowded Valley - Benchmarking Deep Learning
  Optimizers
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
254
169
0
03 Jul 2020
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
Shuai Zheng
Yanghua Peng
Sheng Zha
Mu Li
ODL
75
21
0
24 Jun 2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Wenjie Li
Zhaoyang Zhang
Xinjiang Wang
Ping Luo
ODL
81
28
0
21 Apr 2020
Why are Adaptive Methods Good for Attention Models?
Why are Adaptive Methods Good for Attention Models?
J.N. Zhang
Sai Praneeth Karimireddy
Andreas Veit
Seungyeon Kim
Sashank J. Reddi
Surinder Kumar
S. Sra
125
81
0
06 Dec 2019
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for
  Non Convex Optimization
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization
Anas Barakat
Pascal Bianchi
84
12
0
18 Nov 2019
Does Adam optimizer keep close to the optimal point?
Does Adam optimizer keep close to the optimal point?
Kiwook Bae
Heechang Ryu
Hayong Shin
ODL
34
18
0
01 Nov 2019
An Adaptive and Momental Bound Method for Stochastic Learning
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
57
48
0
27 Oct 2019
On Empirical Comparisons of Optimizers for Deep Learning
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
162
259
0
11 Oct 2019
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity
Why gradient clipping accelerates training: A theoretical justification for adaptivity
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
95
471
0
28 May 2019
A Sufficient Condition for Convergences of Adam and RMSProp
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
92
373
0
23 Nov 2018
Nostalgic Adam: Weighting more of the past gradients when designing the
  adaptive learning rate
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate
Haiwen Huang
Changzhang Wang
Bin Dong
ODL
57
59
0
19 May 2018
1