ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.09632
  4. Cited By
Adam Can Converge Without Any Modification On Update Rules

Adam Can Converge Without Any Modification On Update Rules

20 August 2022
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
ArXivPDFHTML

Papers citing "Adam Can Converge Without Any Modification On Update Rules"

37 / 37 papers shown
Title
Unified Parameter-Efficient Unlearning for LLMs
Chenlu Ding
Jiancan Wu
Yancheng Yuan
Jinda Lu
Kai Zhang
Alex Su
Xiang Wang
Xiangnan He
MU
KELM
152
6
0
30 Nov 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
132
15
0
29 Oct 2024
An Attention-Based Algorithm for Gravity Adaptation Zone Calibration
An Attention-Based Algorithm for Gravity Adaptation Zone Calibration
Chen Yu
38
0
0
06 Oct 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
136
4
0
30 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
110
24
0
27 Jun 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
99
7
0
01 Apr 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
101
13
0
06 Feb 2024
Theoretical analysis of Adam using hyperparameters close to one without
  Lipschitz smoothness
Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness
Hideaki Iiduka
49
5
0
27 Jun 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
Mohammad Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
148
741
0
28 Jan 2022
On the Convergence of mSGD and AdaGrad for Stochastic Optimization
On the Convergence of mSGD and AdaGrad for Stochastic Optimization
Ruinan Jin
Yu Xing
Xingkang He
49
11
0
26 Jan 2022
A Novel Convergence Analysis for Algorithms of the Adam Family
A Novel Convergence Analysis for Algorithms of the Adam Family
Zhishuai Guo
Yi Tian Xu
W. Yin
Rong Jin
Tianbao Yang
61
49
0
07 Dec 2021
SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients
SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients
Feihu Huang
Junyi Li
Heng-Chiao Huang
ODL
50
42
0
15 Jun 2021
Towards Practical Adam: Non-Convexity, Convergence Theory, and
  Mini-Batch Acceleration
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Congliang Chen
Li Shen
Fangyu Zou
Wei Liu
66
29
0
14 Jan 2021
Asymptotic study of stochastic adaptive algorithm in non-convex
  landscape
Asymptotic study of stochastic adaptive algorithm in non-convex landscape
S. Gadat
Ioana Gavra
65
17
0
10 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
648
41,003
0
22 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
755
41,932
0
28 May 2020
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
284
1,903
0
08 Aug 2019
On the Linear Speedup Analysis of Communication Efficient Momentum SGD
  for Distributed Non-Convex Optimization
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
Hao Yu
Rong Jin
Sen Yang
FedML
89
384
0
09 May 2019
On the Convergence of Adam and Beyond
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
93
2,499
0
19 Apr 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Liangchen Luo
Yuanhao Xiong
Yan Liu
Xu Sun
ODL
77
602
0
26 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
241
3,728
0
09 Jan 2019
A Sufficient Condition for Convergences of Adam and RMSProp
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
56
370
0
23 Nov 2018
Fast and Faster Convergence of SGD for Over-Parameterized Models and an
  Accelerated Perceptron
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
Sharan Vaswani
Francis R. Bach
Mark Schmidt
80
298
0
16 Oct 2018
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate
  Methods
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Zhiming Zhou
Qingru Zhang
Guansong Lu
Hongwei Wang
Weinan Zhang
Yong Yu
49
66
0
29 Sep 2018
A Unified Analysis of Stochastic Momentum Methods for Deep Learning
A Unified Analysis of Stochastic Momentum Methods for Deep Learning
Yan Yan
Tianbao Yang
Zhe Li
Qihang Lin
Yi Yang
38
120
0
30 Aug 2018
On the Convergence of Adaptive Gradient Methods for Nonconvex
  Optimization
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
52
151
0
16 Aug 2018
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex
  Optimization
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Xiangyi Chen
Sijia Liu
Ruoyu Sun
Mingyi Hong
55
323
0
08 Aug 2018
Convergence guarantees for RMSProp and ADAM in non-convex optimization
  and an empirical comparison to Nesterov acceleration
Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration
Soham De
Anirbit Mukherjee
Enayat Ullah
56
101
0
18 Jul 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
79
193
0
18 Jun 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
Get To The Point: Summarization with Pointer-Generator Networks
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
293
4,019
0
14 Apr 2017
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
  Networks
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu
Taesung Park
Phillip Isola
Alexei A. Efros
GAN
125
5,553
0
30 Mar 2017
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
323
19,651
0
21 Nov 2016
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
314
2,859
0
26 Sep 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
Unsupervised Representation Learning with Deep Convolutional Generative
  Adversarial Networks
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford
Luke Metz
Soumith Chintala
GAN
OOD
253
14,012
0
19 Nov 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.8K
150,115
0
22 Dec 2014
1