ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.07468
  4. Cited By
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed
  Gradients

AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

15 October 2020
Juntang Zhuang
Tommy M. Tang
Yifan Ding
S. Tatikonda
Nicha Dvornek
X. Papademetris
James S. Duncan
    ODL
ArXivPDFHTML

Papers citing "AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients"

50 / 74 papers shown
Title
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
K. Zhang
Lizhuang Ma
J. Wang
J. Wang
W. Zhang
MQ
54
0
0
01 May 2025
Modelling Mean-Field Games with Neural Ordinary Differential Equations
Modelling Mean-Field Games with Neural Ordinary Differential Equations
Anna C. M. Thöni
Yoram Bachrach
Tal Kachman
35
0
0
17 Apr 2025
Molecular Learning Dynamics
Molecular Learning Dynamics
Yaroslav Gusev
Vitaly Vanchurin
25
1
0
14 Apr 2025
Covariant Gradient Descent
Covariant Gradient Descent
Dmitry Guskov
Vitaly Vanchurin
26
2
0
07 Apr 2025
Learning Versatile Optimizers on a Compute Diet
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
138
0
0
22 Jan 2025
Grams: Gradient Descent with Adaptive Momentum Scaling
Grams: Gradient Descent with Adaptive Momentum Scaling
Yang Cao
Xiaoyu Li
Zhao-quan Song
ODL
87
2
0
22 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Meta Curvature-Aware Minimization for Domain Generalization
Z. Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
191
1
0
16 Dec 2024
Cautious Optimizers: Improving Training with One Line of Code
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
108
5
0
25 Nov 2024
Gradient-free variational learning with conditional mixture networks
Gradient-free variational learning with conditional mixture networks
Conor Heins
Hao Wu
Dimitrije Marković
Alexander Tschantz
Jeff Beck
Christopher L. Buckley
BDL
31
2
0
29 Aug 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
76
2
0
26 May 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
21
2
0
17 Jan 2024
BasisFormer: Attention-based Time Series Forecasting with Learnable and
  Interpretable Basis
BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis
Zelin Ni
Hang Yu
Shizhan Liu
Jianguo Li
Weiyao Lin
AI4TS
26
30
0
31 Oct 2023
Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal
Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal
Leah A. Chrestien
Tomás Pevný
Stefan Edelkamp
Antonín Komenda
34
9
0
30 Oct 2023
Enhancing Low-Order Discontinuous Galerkin Methods with Neural Ordinary Differential Equations for Compressible Navier--Stokes Equations
Enhancing Low-Order Discontinuous Galerkin Methods with Neural Ordinary Differential Equations for Compressible Navier--Stokes Equations
Shinhoo Kang
Emil M. Constantinescu
AI4CE
22
0
0
29 Oct 2023
Smooth Exact Gradient Descent Learning in Spiking Neural Networks
Smooth Exact Gradient Descent Learning in Spiking Neural Networks
Christian Klos
Raoul-Martin Memmesheimer
41
5
0
25 Sep 2023
Identity-preserving Editing of Multiple Facial Attributes by Learning
  Global Edit Directions and Local Adjustments
Identity-preserving Editing of Multiple Facial Attributes by Learning Global Edit Directions and Local Adjustments
Najmeh Mohammadbagheri
Fardin Ayar
A. Nickabadi
R. Safabakhsh
CVBM
GAN
24
3
0
25 Sep 2023
Automatic Differentiation for Inverse Problems with Applications in
  Quantum Transport
Automatic Differentiation for Inverse Problems with Applications in Quantum Transport
I. Williams
E. Polizzi
16
1
0
18 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to
  Adaptive and Non-adaptive Momentum Optimizers
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
27
4
0
02 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
25
8
0
26 Jun 2023
SING: A Plug-and-Play DNN Learning Technique
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
28
0
0
25 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
  Pre-training
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
32
128
0
23 May 2023
MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation
MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation
Abdul Rehman Khan
Asifullah Khan
ViT
MedIm
34
14
0
15 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
30
10
0
12 May 2023
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud
  Environments
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments
Drew Penney
Bin Li
Lizhong Chen
J. Sydir
Anna Drewek-Ossowicka
R. Illikkal
Charlie Tai
R. Iyer
Andrew J. Herdrich
26
1
0
10 Apr 2023
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization
Feihu Huang
Chunyu Xuan
Xinrui Wang
Siqi Zhang
Songcan Chen
28
7
0
07 Mar 2023
Learning Gradually Non-convex Image Priors Using Score Matching
Learning Gradually Non-convex Image Priors Using Score Matching
Erich Kobler
T. Pock
38
3
0
21 Feb 2023
Dataset Distillation with Convexified Implicit Gradients
Dataset Distillation with Convexified Implicit Gradients
Noel Loo
Ramin Hasani
Mathias Lechner
Daniela Rus
DD
29
41
0
13 Feb 2023
Symbolic Discovery of Optimization Algorithms
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
55
350
0
13 Feb 2023
Weight Prediction Boosts the Convergence of AdamW
Weight Prediction Boosts the Convergence of AdamW
Lei Guan
21
15
0
01 Feb 2023
Read the Signs: Towards Invariance to Gradient Descent's Hyperparameter
  Initialization
Read the Signs: Towards Invariance to Gradient Descent's Hyperparameter Initialization
Davood Wadi
M. Fredette
S. Sénécal
ODL
AI4CE
6
0
0
24 Jan 2023
Denoising Diffusion for Sampling SAT Solutions
Denoising Diffusion for Sampling SAT Solutions
Kārlis Freivalds
Sergejs Kozlovics
13
2
0
30 Nov 2022
AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task
  Learning
AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
Enneng Yang
Junwei Pan
Ximei Wang
Haibin Yu
Li Shen
Xihua Chen
Lei Xiao
Jie Jiang
G. Guo
38
43
0
28 Nov 2022
Dealing with missing data using attention and latent space
  regularization
Dealing with missing data using attention and latent space regularization
J. Penny-Dimri
Christoph Bergmeir
Julian Smith
25
0
0
14 Nov 2022
Black Box Lie Group Preconditioners for SGD
Black Box Lie Group Preconditioners for SGD
Xi-Lin Li
11
8
0
08 Nov 2022
Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs,
  Mobile AI & AIM 2022 challenge: Report
Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report
Andrey D. Ignatov
Radu Timofte
Maurizio Denna
Abdelbadie Younes
Ganzorig Gankhuyag
...
Jing Liu
Garas Gendy
Nabil Sabor
J. Hou
Guanghui He
SupR
MQ
20
31
0
07 Nov 2022
Fast Adaptive Federated Bilevel Optimization
Fast Adaptive Federated Bilevel Optimization
Feihu Huang
FedML
20
7
0
02 Nov 2022
Stable Deep MRI Reconstruction using Generative Priors
Stable Deep MRI Reconstruction using Generative Priors
Martin Zach
Florian Knoll
T. Pock
OOD
MedIm
DiffM
29
17
0
25 Oct 2022
Causal Structural Hypothesis Testing and Data Generation Models
Causal Structural Hypothesis Testing and Data Generation Models
Jeffrey Q. Jiang
Omead Brandon Pooladzandi
Sunay Bhat
Gregory Pottie
CML
37
1
0
20 Oct 2022
Towards Theoretically Inspired Neural Initialization Optimization
Towards Theoretically Inspired Neural Initialization Optimization
Yibo Yang
Hong Wang
Haobo Yuan
Zhouchen Lin
16
9
0
12 Oct 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of
  Deep Learning Optimizer using Hyperparameters Close to One
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
30
4
0
21 Aug 2022
DCNNV-19: A Deep Convolutional Neural Network for COVID-19 Detection in
  Chest Computed Tomographies
DCNNV-19: A Deep Convolutional Neural Network for COVID-19 Detection in Chest Computed Tomographies
Victor Felipe Reis-Silva
14
0
0
18 Aug 2022
Efficient Adaptive Ensembling for Image Classification
Efficient Adaptive Ensembling for Image Classification
A. Bruno
Davide Moroni
M. Martinelli
23
18
0
15 Jun 2022
Unified Recurrence Modeling for Video Action Anticipation
Unified Recurrence Modeling for Video Action Anticipation
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Simon See
O. Lanz
21
8
0
02 Jun 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the
  Adaptive Stepsize Range
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
11
2
0
24 Mar 2022
Ligandformer: A Graph Neural Network for Predicting Compound Property
  with Robust Interpretation
Ligandformer: A Graph Neural Network for Predicting Compound Property with Robust Interpretation
Jinjiang Guo
Qi Liu
Han Guo
Xi Lu
AI4CE
16
3
0
21 Feb 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1
  Adam
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
22
20
0
12 Feb 2022
Data augmentation through multivariate scenario forecasting in Data
  Centers using Generative Adversarial Networks
Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks
J. Pérez
Patricia Arroba
Jose M. Moya
27
14
0
12 Jan 2022
BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
Chris Cundy
Aditya Grover
Stefano Ermon
CML
40
72
0
06 Dec 2021
Pyramid Adversarial Training Improves ViT Performance
Pyramid Adversarial Training Improves ViT Performance
Charles Herrmann
Kyle Sargent
Lu Jiang
Ramin Zabih
Huiwen Chang
Ce Liu
Dilip Krishnan
Deqing Sun
ViT
26
56
0
30 Nov 2021
12
Next