Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.07468
Cited By
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
15 October 2020
Juntang Zhuang
Tommy M. Tang
Yifan Ding
S. Tatikonda
Nicha Dvornek
X. Papademetris
James S. Duncan
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients"
50 / 76 papers shown
Title
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kaipeng Zhang
Lizhuang Ma
Yufei Guo
Jun Wang
Wenbo Zhang
MQ
57
0
0
01 May 2025
Modelling Mean-Field Games with Neural Ordinary Differential Equations
Anna C. M. Thöni
Yoram Bachrach
Tal Kachman
38
0
0
17 Apr 2025
Molecular Learning Dynamics
Yaroslav Gusev
Vitaly Vanchurin
28
1
0
14 Apr 2025
Covariant Gradient Descent
Dmitry Guskov
Vitaly Vanchurin
26
2
0
07 Apr 2025
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
147
0
0
22 Jan 2025
Grams: Gradient Descent with Adaptive Momentum Scaling
Yang Cao
Xiaoyu Li
Zhao-quan Song
ODL
89
2
0
22 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Zhe Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
197
1
0
16 Dec 2024
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
108
5
0
25 Nov 2024
Gradient-free variational learning with conditional mixture networks
Conor Heins
Hao Wu
Dimitrije Marković
Alexander Tschantz
Jeff Beck
Christopher L. Buckley
BDL
31
2
0
29 Aug 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
76
2
0
26 May 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
24
2
0
17 Jan 2024
BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis
Zelin Ni
Hang Yu
Shizhan Liu
Jianguo Li
Weiyao Lin
AI4TS
26
30
0
31 Oct 2023
Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal
Leah A. Chrestien
Tomás Pevný
Stefan Edelkamp
Antonín Komenda
36
9
0
30 Oct 2023
Enhancing Low-Order Discontinuous Galerkin Methods with Neural Ordinary Differential Equations for Compressible Navier--Stokes Equations
Shinhoo Kang
Emil M. Constantinescu
AI4CE
22
0
0
29 Oct 2023
Smooth Exact Gradient Descent Learning in Spiking Neural Networks
Christian Klos
Raoul-Martin Memmesheimer
43
6
0
25 Sep 2023
Identity-preserving Editing of Multiple Facial Attributes by Learning Global Edit Directions and Local Adjustments
Najmeh Mohammadbagheri
Fardin Ayar
A. Nickabadi
R. Safabakhsh
CVBM
GAN
24
3
0
25 Sep 2023
Automatic Differentiation for Inverse Problems with Applications in Quantum Transport
I. Williams
E. Polizzi
18
1
0
18 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
30
4
0
02 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
27
8
0
26 Jun 2023
Long-range Language Modeling with Self-retrieval
Ohad Rubin
Jonathan Berant
RALM
KELM
19
18
0
23 Jun 2023
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
30
0
0
25 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
46
128
0
23 May 2023
MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation
Abdul Rehman Khan
Asifullah Khan
ViT
MedIm
39
14
0
15 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
30
10
0
12 May 2023
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments
Drew Penney
Bin Li
Lizhong Chen
J. Sydir
Anna Drewek-Ossowicka
R. Illikkal
Charlie Tai
R. Iyer
Andrew J. Herdrich
26
1
0
10 Apr 2023
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization
Feihu Huang
Chunyu Xuan
Xinrui Wang
Siqi Zhang
Songcan Chen
28
7
0
07 Mar 2023
Learning Gradually Non-convex Image Priors Using Score Matching
Erich Kobler
T. Pock
40
3
0
21 Feb 2023
Dataset Distillation with Convexified Implicit Gradients
Noel Loo
Ramin Hasani
Mathias Lechner
Daniela Rus
DD
31
41
0
13 Feb 2023
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
61
350
0
13 Feb 2023
Weight Prediction Boosts the Convergence of AdamW
Lei Guan
21
15
0
01 Feb 2023
Read the Signs: Towards Invariance to Gradient Descent's Hyperparameter Initialization
Davood Wadi
M. Fredette
S. Sénécal
ODL
AI4CE
8
0
0
24 Jan 2023
Solving the Weather4cast Challenge via Visual Transformers for 3D Images
Yury Belousov
Sergey Polezhaev
Brian Pulfer
13
3
0
05 Dec 2022
Denoising Diffusion for Sampling SAT Solutions
Kārlis Freivalds
Sergejs Kozlovics
13
2
0
30 Nov 2022
AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
Enneng Yang
Junwei Pan
Ximei Wang
Haibin Yu
Li Shen
Xihua Chen
Lei Xiao
Jie Jiang
G. Guo
38
43
0
28 Nov 2022
Dealing with missing data using attention and latent space regularization
J. Penny-Dimri
Christoph Bergmeir
Julian Smith
27
0
0
14 Nov 2022
Black Box Lie Group Preconditioners for SGD
Xi-Lin Li
13
8
0
08 Nov 2022
Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report
Andrey D. Ignatov
Radu Timofte
Maurizio Denna
Abdelbadie Younes
Ganzorig Gankhuyag
...
Jing Liu
Garas Gendy
Nabil Sabor
J. Hou
Guanghui He
SupR
MQ
20
31
0
07 Nov 2022
Fast Adaptive Federated Bilevel Optimization
Feihu Huang
FedML
20
7
0
02 Nov 2022
Stable Deep MRI Reconstruction using Generative Priors
Martin Zach
Florian Knoll
T. Pock
OOD
MedIm
DiffM
29
17
0
25 Oct 2022
Causal Structural Hypothesis Testing and Data Generation Models
Jeffrey Q. Jiang
Omead Brandon Pooladzandi
Sunay Bhat
Gregory Pottie
CML
40
1
0
20 Oct 2022
Towards Theoretically Inspired Neural Initialization Optimization
Yibo Yang
Hong Wang
Haobo Yuan
Zhouchen Lin
21
9
0
12 Oct 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
35
4
0
21 Aug 2022
DCNNV-19: A Deep Convolutional Neural Network for COVID-19 Detection in Chest Computed Tomographies
Victor Felipe Reis-Silva
16
0
0
18 Aug 2022
Efficient Adaptive Ensembling for Image Classification
A. Bruno
Davide Moroni
M. Martinelli
28
18
0
15 Jun 2022
Unified Recurrence Modeling for Video Action Anticipation
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Simon See
Oswald Lanz
21
8
0
02 Jun 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
16
2
0
24 Mar 2022
Ligandformer: A Graph Neural Network for Predicting Compound Property with Robust Interpretation
Jinjiang Guo
Qi Liu
Han Guo
Xi Lu
AI4CE
18
3
0
21 Feb 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
24
20
0
12 Feb 2022
Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks
J. Pérez
Patricia Arroba
Jose M. Moya
27
14
0
12 Jan 2022
1
2
Next