ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size
v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXiv (abs)PDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown
Title
HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems
Zheng Lin
Zhe Chen
Xianhao Chen
Wei Ni
Yue Gao
FedML
34
0
0
10 Jun 2025
A Stable Whitening Optimizer for Efficient Neural Network Training
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans
Sergey Levine
Pieter Abbeel
39
0
0
08 Jun 2025
Variational Adaptive Noise and Dropout towards Stable Recurrent Neural Networks
Variational Adaptive Noise and Dropout towards Stable Recurrent Neural Networks
Taisuke Kobayashi
Shingo Murata
56
0
0
02 Jun 2025
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
William Merrill
Shane Arora
Dirk Groeneveld
Hannaneh Hajishirzi
55
0
0
29 May 2025
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
Alex Iacob
Lorenzo Sani
M. Safaryan
Paris Giampouras
Samuel Horváth
...
Meghdad Kurmanji
Preslav Aleksandrov
William F. Shen
Xinchi Qiu
Nicholas D. Lane
OffRL
112
0
0
28 May 2025
Variational Deep Learning via Implicit Regularization
Variational Deep Learning via Implicit Regularization
Jonathan Wenger
Beau Coker
Juraj Marusic
John P. Cunningham
OODUQCVBDL
64
0
0
26 May 2025
A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
Chen Gong
Rui Xing
Zhenzhe Zheng
Fan Wu
68
0
0
22 May 2025
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
80
2
0
19 May 2025
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Yezhen Wang
Zhouhao Yang
Brian K Chen
Fanyi Pu
Yue Liu
Tianyu Gao
Kenji Kawaguchi
93
0
0
03 May 2025
A Langevin sampling algorithm inspired by the Adam optimizer
A Langevin sampling algorithm inspired by the Adam optimizer
Benedict Leimkuhler
René Lohmann
Peter Whalley
173
0
0
26 Apr 2025
Representation Improvement in Latent Space for Search-Based Testing of Autonomous Robotic Systems
Representation Improvement in Latent Space for Search-Based Testing of Autonomous Robotic Systems
D. Humeniuk
Foutse Khomh
112
0
0
26 Mar 2025
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
S. Tyagi
Prateek Sharma
141
0
0
21 Mar 2025
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Paul Janson
Vaibhav Singh
Paria Mehrbod
Adam Ibrahim
Irina Rish
Eugene Belilovsky
Benjamin Thérien
CLL
135
1
0
04 Mar 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
109
8
0
21 Feb 2025
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
Hikaru Umeda
Hideaki Iiduka
160
2
0
17 Feb 2025
Linear Mode Connectivity in Differentiable Tree Ensembles
Linear Mode Connectivity in Differentiable Tree Ensembles
Ryuichi Kanoh
M. Sugiyama
241
1
0
17 Feb 2025
On the use of neural networks for the structural characterization of polymeric porous materials
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
116
6
0
25 Jan 2025
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
Keisuke Kamo
Hideaki Iiduka
129
0
0
15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
118
4
0
10 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
205
3
0
03 Jan 2025
A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation
A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation
Xiaoqian Liu
Yangfan Du
Jiadong Wang
Yuan Ge
Chen Xu
Tong Xiao
Guocheng Chen
Jingbo Zhu
147
0
0
31 Dec 2024
Weber-Fechner Law in Temporal Difference learning derived from Control as Inference
Weber-Fechner Law in Temporal Difference learning derived from Control as Inference
Keiichiro Takahashi
Taisuke Kobayashi
Tomoya Yamanokuchi
Takamitsu Matsubara
76
0
0
31 Dec 2024
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
473
0
0
30 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
200
7
0
17 Dec 2024
Impact of Privacy Parameters on Deep Learning Models for Image
  Classification
Impact of Privacy Parameters on Deep Learning Models for Image Classification
Basanta Chaulagain
96
0
0
09 Dec 2024
Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for
  Benchmarking Robust Machine Learning and Label Correction Methods
Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods
Jiamian Hu
Yuanyuan Hong
Yihua Chen
He Wang
Moriaki Yasuhara
131
1
0
03 Dec 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
203
18
0
29 Oct 2024
Convergence of Sharpness-Aware Minimization Algorithms using Increasing
  Batch Size and Decaying Learning Rate
Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate
Hinata Harada
Hideaki Iiduka
65
1
0
16 Sep 2024
Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions
Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions
Sully F. Chen
Robert J. Steele
Glen M. Hocky
Beakal Lemeneh
S. Lad
Eric Oermann
AI4CE
93
0
0
29 Aug 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
142
1
0
26 Aug 2024
Scaling Law with Learning Rate Annealing
Scaling Law with Learning Rate Annealing
Howe Tissue
Venus Wang
Lu Wang
108
9
0
20 Aug 2024
Stochastic weight matrix dynamics during learning and Dyson Brownian
  motion
Stochastic weight matrix dynamics during learning and Dyson Brownian motion
Gert Aarts
B. Lucini
Chanju Park
86
1
0
23 Jul 2024
Localizing Anomalies via Multiscale Score Matching Analysis
Localizing Anomalies via Multiscale Score Matching Analysis
Ahsan Mahmood
Junier Oliva
M. Styner
51
1
0
28 Jun 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed
  Local Gradient Methods
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
92
1
0
20 Jun 2024
Meta-Learning Neural Procedural Biases
Meta-Learning Neural Procedural Biases
Christian Raymond
Qi Chen
Bing Xue
Mengjie Zhan
107
1
0
12 Jun 2024
Primitive Agentic First-Order Optimization
Primitive Agentic First-Order Optimization
R. Sala
73
0
0
07 Jun 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training
  Durations
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
127
45
0
28 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical
  Fisher information
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Dongseong Hwang
ODL
101
9
0
21 May 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
112
4
0
30 Apr 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
  Training Strategies
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
131
347
0
09 Apr 2024
Rolling the dice for better deep learning performance: A study of
  randomness techniques in deep neural networks
Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks
Mohammed Ghaith Altarabichi
Sławomir Nowaczyk
Sepideh Pashami
Peyman Sheikholharam Mashhadi
Julia Handl
42
11
0
05 Apr 2024
AdaptSFL: Adaptive Split Federated Learning in Resource-constrained Edge Networks
AdaptSFL: Adaptive Split Federated Learning in Resource-constrained Edge Networks
Zhengyi Lin
Guanqiao Qu
Wei Wei
Xianhao Chen
Kin K. Leung
130
51
0
19 Mar 2024
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of
  Neurons
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
P. DÓro
Evgenii Nikishin
Razvan Pascanu
Pierre-Luc Bacon
A. Baratin
111
1
0
12 Mar 2024
A Tutorial on the Pretrain-Finetune Paradigm for Natural Language
  Processing
A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing
Yu Wang
Wen Qu
92
0
0
04 Mar 2024
Batch size invariant Adam
Batch size invariant Adam
Xi Wang
Laurence Aitchison
89
2
0
29 Feb 2024
Principled Architecture-aware Scaling of Hyperparameters
Principled Architecture-aware Scaling of Hyperparameters
Wuyang Chen
Junru Wu
Zhangyang Wang
Boris Hanin
AI4CE
104
0
0
27 Feb 2024
Iteration and Stochastic First-order Oracle Complexities of Stochastic
  Gradient Descent using Constant and Decaying Learning Rates
Iteration and Stochastic First-order Oracle Complexities of Stochastic Gradient Descent using Constant and Decaying Learning Rates
Kento Imaizumi
Hideaki Iiduka
63
2
0
23 Feb 2024
Scaling physics-informed hard constraints with mixture-of-experts
Scaling physics-informed hard constraints with mixture-of-experts
N. Chalapathi
Yiheng Du
Aditi Krishnapriyan
AI4CE
102
16
0
20 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
85
6
0
17 Feb 2024
A Framework For Gait-Based User Demography Estimation Using Inertial
  Sensors
A Framework For Gait-Based User Demography Estimation Using Inertial Sensors
C. Swami
40
1
0
15 Feb 2024
1234...8910
Next