Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.07468
Cited By
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
15 October 2020
Juntang Zhuang
Tommy M. Tang
Yifan Ding
S. Tatikonda
Nicha Dvornek
X. Papademetris
James S. Duncan
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients"
50 / 74 papers shown
Title
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
K. Zhang
Lizhuang Ma
J. Wang
J. Wang
W. Zhang
MQ
54
0
0
01 May 2025
Modelling Mean-Field Games with Neural Ordinary Differential Equations
Anna C. M. Thöni
Yoram Bachrach
Tal Kachman
35
0
0
17 Apr 2025
Molecular Learning Dynamics
Yaroslav Gusev
Vitaly Vanchurin
25
1
0
14 Apr 2025
Covariant Gradient Descent
Dmitry Guskov
Vitaly Vanchurin
26
2
0
07 Apr 2025
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
138
0
0
22 Jan 2025
Grams: Gradient Descent with Adaptive Momentum Scaling
Yang Cao
Xiaoyu Li
Zhao-quan Song
ODL
87
2
0
22 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Z. Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
191
1
0
16 Dec 2024
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
108
5
0
25 Nov 2024
Gradient-free variational learning with conditional mixture networks
Conor Heins
Hao Wu
Dimitrije Marković
Alexander Tschantz
Jeff Beck
Christopher L. Buckley
BDL
31
2
0
29 Aug 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
76
2
0
26 May 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
21
2
0
17 Jan 2024
BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis
Zelin Ni
Hang Yu
Shizhan Liu
Jianguo Li
Weiyao Lin
AI4TS
26
30
0
31 Oct 2023
Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal
Leah A. Chrestien
Tomás Pevný
Stefan Edelkamp
Antonín Komenda
34
9
0
30 Oct 2023
Enhancing Low-Order Discontinuous Galerkin Methods with Neural Ordinary Differential Equations for Compressible Navier--Stokes Equations
Shinhoo Kang
Emil M. Constantinescu
AI4CE
22
0
0
29 Oct 2023
Smooth Exact Gradient Descent Learning in Spiking Neural Networks
Christian Klos
Raoul-Martin Memmesheimer
41
5
0
25 Sep 2023
Identity-preserving Editing of Multiple Facial Attributes by Learning Global Edit Directions and Local Adjustments
Najmeh Mohammadbagheri
Fardin Ayar
A. Nickabadi
R. Safabakhsh
CVBM
GAN
24
3
0
25 Sep 2023
Automatic Differentiation for Inverse Problems with Applications in Quantum Transport
I. Williams
E. Polizzi
16
1
0
18 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
27
4
0
02 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
25
8
0
26 Jun 2023
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
28
0
0
25 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
32
128
0
23 May 2023
MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation
Abdul Rehman Khan
Asifullah Khan
ViT
MedIm
34
14
0
15 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
30
10
0
12 May 2023
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments
Drew Penney
Bin Li
Lizhong Chen
J. Sydir
Anna Drewek-Ossowicka
R. Illikkal
Charlie Tai
R. Iyer
Andrew J. Herdrich
26
1
0
10 Apr 2023
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization
Feihu Huang
Chunyu Xuan
Xinrui Wang
Siqi Zhang
Songcan Chen
28
7
0
07 Mar 2023
Learning Gradually Non-convex Image Priors Using Score Matching
Erich Kobler
T. Pock
38
3
0
21 Feb 2023
Dataset Distillation with Convexified Implicit Gradients
Noel Loo
Ramin Hasani
Mathias Lechner
Daniela Rus
DD
29
41
0
13 Feb 2023
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
55
350
0
13 Feb 2023
Weight Prediction Boosts the Convergence of AdamW
Lei Guan
21
15
0
01 Feb 2023
Read the Signs: Towards Invariance to Gradient Descent's Hyperparameter Initialization
Davood Wadi
M. Fredette
S. Sénécal
ODL
AI4CE
6
0
0
24 Jan 2023
Denoising Diffusion for Sampling SAT Solutions
Kārlis Freivalds
Sergejs Kozlovics
13
2
0
30 Nov 2022
AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
Enneng Yang
Junwei Pan
Ximei Wang
Haibin Yu
Li Shen
Xihua Chen
Lei Xiao
Jie Jiang
G. Guo
38
43
0
28 Nov 2022
Dealing with missing data using attention and latent space regularization
J. Penny-Dimri
Christoph Bergmeir
Julian Smith
25
0
0
14 Nov 2022
Black Box Lie Group Preconditioners for SGD
Xi-Lin Li
11
8
0
08 Nov 2022
Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report
Andrey D. Ignatov
Radu Timofte
Maurizio Denna
Abdelbadie Younes
Ganzorig Gankhuyag
...
Jing Liu
Garas Gendy
Nabil Sabor
J. Hou
Guanghui He
SupR
MQ
20
31
0
07 Nov 2022
Fast Adaptive Federated Bilevel Optimization
Feihu Huang
FedML
20
7
0
02 Nov 2022
Stable Deep MRI Reconstruction using Generative Priors
Martin Zach
Florian Knoll
T. Pock
OOD
MedIm
DiffM
29
17
0
25 Oct 2022
Causal Structural Hypothesis Testing and Data Generation Models
Jeffrey Q. Jiang
Omead Brandon Pooladzandi
Sunay Bhat
Gregory Pottie
CML
37
1
0
20 Oct 2022
Towards Theoretically Inspired Neural Initialization Optimization
Yibo Yang
Hong Wang
Haobo Yuan
Zhouchen Lin
16
9
0
12 Oct 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
30
4
0
21 Aug 2022
DCNNV-19: A Deep Convolutional Neural Network for COVID-19 Detection in Chest Computed Tomographies
Victor Felipe Reis-Silva
14
0
0
18 Aug 2022
Efficient Adaptive Ensembling for Image Classification
A. Bruno
Davide Moroni
M. Martinelli
23
18
0
15 Jun 2022
Unified Recurrence Modeling for Video Action Anticipation
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Simon See
O. Lanz
21
8
0
02 Jun 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
11
2
0
24 Mar 2022
Ligandformer: A Graph Neural Network for Predicting Compound Property with Robust Interpretation
Jinjiang Guo
Qi Liu
Han Guo
Xi Lu
AI4CE
16
3
0
21 Feb 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
22
20
0
12 Feb 2022
Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks
J. Pérez
Patricia Arroba
Jose M. Moya
27
14
0
12 Jan 2022
BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
Chris Cundy
Aditya Grover
Stefano Ermon
CML
40
72
0
06 Dec 2021
Pyramid Adversarial Training Improves ViT Performance
Charles Herrmann
Kyle Sargent
Lu Jiang
Ramin Zabih
Huiwen Chang
Ce Liu
Dilip Krishnan
Deqing Sun
ViT
26
56
0
30 Nov 2021
1
2
Next