ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.06677
  4. Cited By
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep
  Models

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

13 August 2022
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
    ODL
ArXivPDFHTML

Papers citing "Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models"

33 / 33 papers shown
Title
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
99
1
0
18 Mar 2025
Cautious Optimizers: Improving Training with One Line of Code
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
133
5
0
25 Nov 2024
RedPajama: an Open Dataset for Training Large Language Models
RedPajama: an Open Dataset for Training Large Language Models
Maurice Weber
Daniel Y. Fu
Quentin Anthony
Yonatan Oren
S. Adams
...
Tri Dao
Percy Liang
Christopher Ré
Irina Rish
Ce Zhang
162
66
0
19 Nov 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
56
3
0
03 Oct 2024
Imaging foundation model for universal enhancement of non-ideal measurement CT
Imaging foundation model for universal enhancement of non-ideal measurement CT
Yuxin Liu
Rongjun Ge
Yuting He
Zhan Wu
Chenyu You
Yuan Gao
Chenyu You
Ge Wang
Yang Chen
Shuo Li
MedIm
46
2
0
02 Oct 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
79
48
0
01 Aug 2024
DeepMpMRI: Tensor-decomposition Regularized Learning for Fast and High-Fidelity Multi-Parametric Microstructural MR Imaging
DeepMpMRI: Tensor-decomposition Regularized Learning for Fast and High-Fidelity Multi-Parametric Microstructural MR Imaging
Wenxin Fan
Jian Cheng
Qiyuan Tian
Xinrui Ma
Jing Yang
J. Zou
Shanshan Wang
MedIm
51
1
0
06 May 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
44
18
0
08 Feb 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
67
1,049
0
08 Jan 2024
DeiT III: Revenge of the ViT
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
97
402
0
14 Apr 2022
Context Autoencoder for Self-Supervised Representation Learning
Context Autoencoder for Self-Supervised Representation Learning
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
SSL
54
391
0
07 Feb 2022
Restarted Nonconvex Accelerated Gradient Descent: No More
  Polylogarithmic Factor in the $O(ε^{-7/4})$ Complexity
Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the O(ε−7/4)O(ε^{-7/4})O(ε−7/4) Complexity
Huan Li
Zhouchen Lin
56
22
0
27 Jan 2022
MetaFormer Is Actually What You Need for Vision
MetaFormer Is Actually What You Need for Vision
Weihao Yu
Mi Luo
Pan Zhou
Chenyang Si
Yichen Zhou
Xinchao Wang
Jiashi Feng
Shuicheng Yan
117
893
0
22 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
40
14
0
01 Nov 2021
ResNet strikes back: An improved training procedure in timm
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
230
489
0
01 Oct 2021
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers
  Suffice Across Batch Sizes
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Zachary Nado
Justin M. Gilmer
Christopher J. Shallue
Rohan Anil
George E. Dahl
ODL
35
27
0
12 Feb 2021
Adam$^+$: A Stochastic Method with Adaptive Variance Reduction
Adam+^++: A Stochastic Method with Adaptive Variance Reduction
Mingrui Liu
Wei Zhang
Francesco Orabona
Tianbao Yang
29
27
0
24 Nov 2020
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed
  Gradients
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Juntang Zhuang
Tommy M. Tang
Yifan Ding
S. Tatikonda
Nicha Dvornek
X. Papademetris
James S. Duncan
ODL
64
505
0
15 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
129
4,993
0
08 Oct 2020
Second-Order Information in Non-Convex Stochastic Optimization: Power
  and Limitations
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations
Yossi Arjevani
Y. Carmon
John C. Duchi
Dylan J. Foster
Ayush Sekhari
Karthik Sridharan
106
53
0
24 Jun 2020
Open Graph Benchmark: Datasets for Machine Learning on Graphs
Open Graph Benchmark: Datasets for Machine Learning on Graphs
Weihua Hu
Matthias Fey
Marinka Zitnik
Yuxiao Dong
Hongyu Ren
Bowen Liu
Michele Catasta
J. Leskovec
152
2,687
0
02 May 2020
MMDetection: Open MMLab Detection Toolbox and Benchmark
MMDetection: Open MMLab Detection Toolbox and Benchmark
Kai-xiang Chen
Jiaqi Wang
Jiangmiao Pang
Yuhang Cao
Yu Xiong
...
Jingdong Wang
Jianping Shi
Wanli Ouyang
Chen Change Loy
Dahua Lin
VOS
78
2,845
0
17 Jun 2019
SPoC: Search-based Pseudocode to Code
SPoC: Search-based Pseudocode to Code
Sumith Kulal
Panupong Pasupat
Kartik Chandra
Mina Lee
Oded Padon
A. Aiken
Percy Liang
42
215
0
12 Jun 2019
CutMix: Regularization Strategy to Train Strong Classifiers with
  Localizable Features
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
553
4,735
0
13 May 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
129
991
0
01 Apr 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Liangchen Luo
Yuanhao Xiong
Yan Liu
Xu Sun
ODL
25
600
0
26 Feb 2019
Sharp Analysis for Nonconvex SGD Escaping from Saddle Points
Sharp Analysis for Nonconvex SGD Escaping from Saddle Points
Cong Fang
Zhouchen Lin
Tong Zhang
50
104
0
01 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
103
3,707
0
09 Jan 2019
On the Convergence of Adaptive Gradient Methods for Nonconvex
  Optimization
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
27
150
0
16 Aug 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
42
193
0
18 Jun 2018
Deep Networks with Stochastic Depth
Deep Networks with Stochastic Depth
Gao Huang
Yu Sun
Zhuang Liu
Daniel Sedra
Kilian Q. Weinberger
119
2,344
0
30 Mar 2016
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
235
43,511
0
17 Sep 2014
Improvements to deep convolutional neural networks for LVCSR
Improvements to deep convolutional neural networks for LVCSR
Tara N. Sainath
Brian Kingsbury
Abdel-rahman Mohamed
George E. Dahl
G. Saon
H. Soltau
T. Beran
Aleksandr Aravkin
Bhuvana Ramabhadran
51
228
0
05 Sep 2013
1