ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size
v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXiv (abs)PDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown
Title
Solving Regularized Exp, Cosh and Sinh Regression Problems
Solving Regularized Exp, Cosh and Sinh Regression Problems
Zhihang Li
Zhao Song
Dinesh Manocha
97
39
0
28 Mar 2023
Learning Rate Schedules in the Presence of Distribution Shift
Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach
Adel Javanmard
Vahab Mirrokni
Pratik Worah
76
7
0
27 Mar 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
56
11
0
05 Mar 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
94
172
0
03 Mar 2023
FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling
FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling
Wei-Yin Ko
Daniel D'souza
Karina Nguyen
Randall Balestriero
Sara Hooker
FedML
82
11
0
01 Mar 2023
Particle-based Online Bayesian Sampling
Particle-based Online Bayesian Sampling
Yifan Yang
Chang-rui Liu
Zhengze Zhang
BDL
97
8
0
28 Feb 2023
Why is parameter averaging beneficial in SGD? An objective smoothing
  perspective
Why is parameter averaging beneficial in SGD? An objective smoothing perspective
Atsushi Nitanda
Ryuhei Kikuchi
Shugo Maeda
Denny Wu
FedML
58
0
0
18 Feb 2023
Topology-aware Federated Learning in Edge Computing: A Comprehensive
  Survey
Topology-aware Federated Learning in Edge Computing: A Comprehensive Survey
Jiajun Wu
Steve Drew
Fan Dong
Zhuangdi Zhu
Jiayu Zhou
FedML
128
53
0
06 Feb 2023
Coordinating Distributed Example Orders for Provably Accelerated
  Training
Coordinating Distributed Example Orders for Provably Accelerated Training
A. Feder Cooper
Wentao Guo
Khiem Pham
Tiancheng Yuan
Charlie F. Ruan
Yucheng Lu
Chris De Sa
163
7
0
02 Feb 2023
Deep networks for system identification: a Survey
Deep networks for system identification: a Survey
G. Pillonetto
Aleksandr Aravkin
Daniel Gedon
L. Ljung
Antônio H. Ribeiro
Thomas B. Schon
OOD
111
45
0
30 Jan 2023
DIFFER: Decomposing Individual Reward for Fair Experience Replay in
  Multi-Agent Reinforcement Learning
DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning
Xu Hu
Jian Zhao
Wen-gang Zhou
Ruili Feng
Houqiang Li
64
1
0
25 Jan 2023
ScaDLES: Scalable Deep Learning over Streaming data at the Edge
ScaDLES: Scalable Deep Learning over Streaming data at the Edge
S. Tyagi
Martin Swany
52
6
0
21 Jan 2023
A reinforcement learning path planning approach for range-only
  underwater target localization with autonomous vehicles
A reinforcement learning path planning approach for range-only underwater target localization with autonomous vehicles
Ivan Masmitja
Mario Martin
K. Katija
S. Gomáriz
J. Navarro
52
6
0
17 Jan 2023
Contextually Enhanced ES-dRNN with Dynamic Attention for Short-Term Load
  Forecasting
Contextually Enhanced ES-dRNN with Dynamic Attention for Short-Term Load Forecasting
Slawek Smyl
Grzegorz Dudek
Paweł Pełka
AI4TS
74
14
0
18 Dec 2022
Maximal Initial Learning Rates in Deep ReLU Networks
Maximal Initial Learning Rates in Deep ReLU Networks
Gaurav M. Iyer
Boris Hanin
David Rolnick
83
10
0
14 Dec 2022
Accelerating Self-Supervised Learning via Efficient Training Strategies
Accelerating Self-Supervised Learning via Efficient Training Strategies
Mustafa Taha Koccyiugit
Timothy M. Hospedales
Hakan Bilen
SSL
68
8
0
11 Dec 2022
Error-aware Quantization through Noise Tempering
Error-aware Quantization through Noise Tempering
Zheng Wang
Juncheng Billy Li
Shuhui Qu
Florian Metze
Emma Strubell
MQ
50
2
0
11 Dec 2022
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient
  Federated Learning
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient Federated Learning
Young Geun Kim
Carole-Jean Wu
FedML
95
5
0
30 Nov 2022
Aspects of scaling and scalability for flow-based sampling of lattice
  QCD
Aspects of scaling and scalability for flow-based sampling of lattice QCD
Ryan Abbott
M. S. Albergo
Aleksandar Botev
D. Boyda
Kyle Cranmer
...
Ali Razavi
Danilo Jimenez Rezende
F. Romero-López
P. Shanahan
Julian M. Urban
116
33
0
14 Nov 2022
Breadth-First Pipeline Parallelism
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNNMoEAI4CE
58
1
0
11 Nov 2022
Adaptive scaling of the learning rate by second order automatic
  differentiation
Adaptive scaling of the learning rate by second order automatic differentiation
F. Gournay
Alban Gossard
ODL
57
2
0
26 Oct 2022
On Robust Incremental Learning over Many Multilingual Steps
On Robust Incremental Learning over Many Multilingual Steps
Karan Praharaj
Irina Matveeva
CLL
57
1
0
25 Oct 2022
Large Batch and Patch Size Training for Medical Image Segmentation
Large Batch and Patch Size Training for Medical Image Segmentation
Junya Sato
Shoji Kido
48
2
0
24 Oct 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the
  Memory Usage of Neural Networks
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
67
9
0
24 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural
  Networks Trained with Large Batch Sizes
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
85
12
0
21 Oct 2022
Accelerating Transfer Learning with Near-Data Computation on Cloud
  Object Stores
Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores
Arsany Guirguis
Diana Petrescu
Florin Dinu
D. Quoc
Javier Picorel
R. Guerraoui
75
0
0
16 Oct 2022
AnalogVNN: A fully modular framework for modeling and optimizing
  photonic neural networks
AnalogVNN: A fully modular framework for modeling and optimizing photonic neural networks
Vivswan Shah
Nathan Youngblood
85
4
0
14 Oct 2022
Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term
  Multi-Object Tracking?
Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking?
Patrick Dendorfer
V. Yugay
Aljosa Osep
Laura Leal-Taixé
118
42
0
14 Oct 2022
Vision Transformers provably learn spatial structure
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViTMLT
103
83
0
13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
118
51
0
13 Oct 2022
Characterization of anomalous diffusion through convolutional
  transformers
Characterization of anomalous diffusion through convolutional transformers
Nicolás Firbas
Òscar Garibo i Orts
M. Garcia-March
J. A. Conejero
86
19
0
10 Oct 2022
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation
  in Machine Learning
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning
Pengfei Zheng
Rui Pan
Tarannum Khan
Shivaram Venkataraman
Aditya Akella
93
22
0
30 Sep 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of
  Stability
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
112
88
0
30 Sep 2022
Why neural networks find simple solutions: the many regularizers of
  geometric complexity
Why neural networks find simple solutions: the many regularizers of geometric complexity
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
135
32
0
27 Sep 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of
  Deep Learning Optimizer using Hyperparameters Close to One
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
60
4
0
21 Aug 2022
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN
  Training
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training
Jie You
Jaehoon Chung
Mosharaf Chowdhury
84
82
0
12 Aug 2022
Class-Incremental Learning with Cross-Space Clustering and Controlled
  Transfer
Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer
Arjun Ashok
K. J. Joseph
V. Balasubramanian
CLL
61
29
0
07 Aug 2022
Adaptive Stochastic Gradient Descent for Fast and
  Communication-Efficient Distributed Learning
Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning
Serge Kas Hanna
Rawad Bitar
Parimal Parag
Venkateswara Dasari
S. E. Rouayheb
93
3
0
04 Aug 2022
Dynamic Batch Adaptation
Dynamic Batch Adaptation
Cristian Simionescu
George Stoica
Robert Herscovici
ODL
66
1
0
01 Aug 2022
Efficient NLP Model Finetuning via Multistage Data Filtering
Efficient NLP Model Finetuning via Multistage Data Filtering
Ouyang Xu
S. Ansari
F. Lin
Yangfeng Ji
74
4
0
28 Jul 2022
On the benefits of non-linear weight updates
On the benefits of non-linear weight updates
Paul Norridge
57
0
0
25 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
97
8
0
19 Jul 2022
Towards understanding how momentum improves generalization in deep
  learning
Towards understanding how momentum improves generalization in deep learning
Samy Jelassi
Yuanzhi Li
ODLMLTAI4CE
90
38
0
13 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in
  E-commerce
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
125
17
0
01 Jul 2022
Theoretical analysis of Adam using hyperparameters close to one without
  Lipschitz smoothness
Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness
Hideaki Iiduka
73
5
0
27 Jun 2022
On the Maximum Hessian Eigenvalue and Generalization
On the Maximum Hessian Eigenvalue and Generalization
Simran Kaur
Jérémy E. Cohen
Zachary Chase Lipton
126
43
0
21 Jun 2022
Pisces: Efficient Federated Learning via Guided Asynchronous Training
Pisces: Efficient Federated Learning via Guided Asynchronous Training
Zhifeng Jiang
Wei Wang
Baochun Li
Yue Liu
FedML
73
25
0
18 Jun 2022
Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
Adam W. Harley
Zhaoyuan Fang
Jie Li
Rares Andrei Ambrus
Katerina Fragkiadaki
120
131
0
16 Jun 2022
On the fast convergence of minibatch heavy ball momentum
On the fast convergence of minibatch heavy ball momentum
Raghu Bollapragada
Tyler Chen
Rachel A. Ward
122
19
0
15 Jun 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and
  Stronger
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Zhiqi Bu
Yu Wang
Sheng Zha
George Karypis
141
72
0
14 Jun 2022
Previous
123456...8910
Next