Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.02941
Cited By
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
8 August 2018
Xiangyi Chen
Sijia Liu
Ruoyu Sun
Mingyi Hong
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization"
50 / 72 papers shown
Title
Sharp higher order convergence rates for the Adam optimizer
Steffen Dereich
Arnulf Jentzen
Adrian Riekert
ODL
61
0
0
28 Apr 2025
A Langevin sampling algorithm inspired by the Adam optimizer
B. Leimkuhler
René Lohmann
P. Whalley
79
0
0
26 Apr 2025
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
Xianliang Li
Jun Luo
Zhiwei Zheng
Hanxiao Wang
Li Luo
Lingkun Wen
Linlong Wu
Sheng Xu
74
0
0
29 Nov 2024
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
Thomas Robert
M. Safaryan
Ionut-Vlad Modoranu
Dan Alistarh
ODL
38
2
0
21 Oct 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
78
2
0
26 May 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
39
1
0
12 Apr 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
55
13
0
05 Apr 2024
Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning
Jiawu Tian
Liwei Xu
Xiaowei Zhang
Yongqi Li
ODL
56
0
0
02 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
42
4
0
01 Apr 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
46
2
0
07 Feb 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
51
12
0
06 Feb 2024
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
38
4
0
02 Jul 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
47
15
0
21 May 2023
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Chunyan Xiong
Meng Lu
Xiaotong Yu
JIAN-PENG Cao
Zhong Chen
D. Guo
X. Qu
MLT
43
0
0
14 Apr 2023
Stochastic Variable Metric Proximal Gradient with variance reduction for non-convex composite optimization
G. Fort
Eric Moulines
46
6
0
02 Jan 2023
Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression
Xiaoyun Li
Ping Li
FedML
39
4
0
25 Nov 2022
Fast Adaptive Federated Bilevel Optimization
Feihu Huang
FedML
29
7
0
02 Nov 2022
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization
Xiang Li
Junchi Yang
Niao He
34
8
0
31 Oct 2022
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Wenhan Xian
Feihu Huang
Heng-Chiao Huang
FedML
35
0
0
14 Oct 2022
Robustness to Unbounded Smoothness of Generalized SignSGD
M. Crawshaw
Mingrui Liu
Francesco Orabona
Wei Zhang
Zhenxun Zhuang
AAML
36
66
0
23 Aug 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
38
4
0
21 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
23
63
0
20 Aug 2022
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale
Gaoyuan Zhang
Songtao Lu
Yihua Zhang
Xiangyi Chen
Pin-Yu Chen
Quanfu Fan
Lee Martie
L. Horesh
Min-Fong Hong
Sijia Liu
OOD
35
12
0
13 Jun 2022
Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization
Junchi Yang
Xiang Li
Niao He
ODL
45
22
0
01 Jun 2022
Efficient-Adam: Communication-Efficient Distributed Adam
Congliang Chen
Li Shen
Wei Liu
Zhi-Quan Luo
34
19
0
28 May 2022
Communication-Efficient Adaptive Federated Learning
Yujia Wang
Lu Lin
Jinghui Chen
FedML
27
71
0
05 May 2022
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize
Ali Kavis
Kfir Y. Levy
V. Cevher
25
41
0
06 Apr 2022
An Adaptive Gradient Method with Energy and Momentum
Hailiang Liu
Xuping Tian
ODL
21
9
0
23 Mar 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
29
20
0
12 Feb 2022
Understanding AdamW through Proximal Methods and Scale-Freeness
Zhenxun Zhuang
Mingrui Liu
Ashok Cutkosky
Francesco Orabona
39
63
0
31 Jan 2022
A Stochastic Bundle Method for Interpolating Networks
Alasdair Paren
Leonard Berrada
Rudra P. K. Poudel
M. P. Kumar
26
4
0
29 Jan 2022
Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising
Weijie Zhao
Xuewu Jiao
Mingqing Hu
Xiaoyun Li
Xinming Zhang
Ping Li
3DV
40
8
0
05 Jan 2022
Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization
Hideaki Iiduka
15
0
0
14 Dec 2021
A Novel Convergence Analysis for Algorithms of the Adam Family
Zhishuai Guo
Yi Tian Xu
W. Yin
Rong Jin
Tianbao Yang
39
48
0
07 Dec 2021
Adaptive Differentially Private Empirical Risk Minimization
Xiaoxia Wu
Lingxiao Wang
Irina Cristali
Quanquan Gu
Rebecca Willett
40
6
0
14 Oct 2021
Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
Fu Wei
Chenglong Bao
Yang Liu
33
19
0
04 Oct 2021
On the Convergence of Decentralized Adaptive Gradient Methods
Xiangyi Chen
Belhal Karimi
Weijie Zhao
Ping Li
23
21
0
07 Sep 2021
The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size
Hideaki Iiduka
26
2
0
26 Aug 2021
A New Adaptive Gradient Method with Gradient Decomposition
Zhou Shao
Tong Lin
ODL
13
0
0
18 Jul 2021
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
A. Davtyan
Sepehr Sameni
L. Cerkezi
Givi Meishvili
Adam Bielski
Paolo Favaro
ODL
58
2
0
07 Jul 2021
Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Kushal Chakrabarti
Nikhil Chopra
ODL
AI4CE
40
9
0
31 May 2021
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
Dong-Young Lim
Sotirios Sabanis
39
11
0
28 May 2021
CADA: Communication-Adaptive Distributed Adam
Tianyi Chen
Ziye Guo
Yuejiao Sun
W. Yin
ODL
14
24
0
31 Dec 2020
SMG: A Shuffling Gradient-Based Method with Momentum
Trang H. Tran
Lam M. Nguyen
Quoc Tran-Dinh
23
21
0
24 Nov 2020
A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks
Zhiqi Bu
Shiyun Xu
Kan Chen
38
17
0
25 Oct 2020
A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms
Chao Ma
Lei Wu
E. Weinan
ODL
19
23
0
14 Sep 2020
Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization
Tianyi Chen
Yuejiao Sun
W. Yin
48
81
0
25 Aug 2020
A High Probability Analysis of Adaptive SGD with Momentum
Xiaoyun Li
Francesco Orabona
92
66
0
28 Jul 2020
Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities
Alina Ene
Huy Le Nguyen
Adrian Vladu
ODL
30
28
0
17 Jul 2020
A General Family of Stochastic Proximal Gradient Methods for Deep Learning
Jihun Yun
A. Lozano
Eunho Yang
22
12
0
15 Jul 2020
1
2
Next