ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size
v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXiv (abs)PDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown
Title
BFTrainer: Low-Cost Training of Neural Networks on Unfillable
  Supercomputer Nodes
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Zhengchun Liu
R. Kettimuthu
M. Papka
Ian Foster
48
3
0
22 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of
  Tooling
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
Shuaiwen Leon Song
Sara Hooker
87
78
0
22 Jun 2021
Deep Learning Through the Lens of Example Difficulty
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
95
161
0
17 Jun 2021
An Empirical Study on Hyperparameter Optimization for Fine-Tuning
  Pre-trained Language Models
An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models
Xueqing Liu
Chi Wang
65
19
0
17 Jun 2021
On Large-Cohort Training for Federated Learning
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
79
114
0
15 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen
Kshitiz Malik
Hongyuan Zhan
Ashkan Yousefpour
Michael G. Rabbat
Mani Malek
Dzmitry Huba
FedML
101
316
0
11 Jun 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
145
120
0
11 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and
  efficient training of large language models
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
146
8
0
04 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
87
13
0
01 Jun 2021
Maximizing Parallelism in Distributed Training for Huge Neural Networks
Maximizing Parallelism in Distributed Training for Huge Neural Networks
Zhengda Bian
Qifan Xu
Boxiang Wang
Yang You
MoE
63
48
0
30 May 2021
Rethinking "Batch" in BatchNorm
Rethinking "Batch" in BatchNorm
Yuxin Wu
Justin Johnson
BDL
125
66
0
17 May 2021
DoS and DDoS Mitigation Using Variational Autoencoders
DoS and DDoS Mitigation Using Variational Autoencoders
Eirik Molde Bårli
Anis Yazidi
E. Herrera-Viedma
H. Haugerud
AAMLDRL
36
16
0
14 May 2021
Deep Neural Network as an alternative to Boosted Decision Trees for PID
Deep Neural Network as an alternative to Boosted Decision Trees for PID
Denis Stanev
Riccardo Riva
Michele Umassi
PINN
47
1
0
28 Apr 2021
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep
  Learning
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning
Shijian Li
Oren Mangoubi
Lijie Xu
Tian Guo
101
15
0
16 Apr 2021
NeuSE: A Neural Snapshot Ensemble Method for Collaborative Filtering
NeuSE: A Neural Snapshot Ensemble Method for Collaborative Filtering
Dongsheng Li
Haodong Liu
Chao Chen
Yingying Zhao
Stephen M. Chu
Bo Yang
FedML
44
5
0
15 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
  Improve Generalization
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
123
30
0
31 Mar 2021
Empirically explaining SGD from a line search perspective
Empirically explaining SGD from a line search perspective
Max Mutschler
A. Zell
ODLLRM
71
4
0
31 Mar 2021
Exploiting Invariance in Training Deep Neural Networks
Exploiting Invariance in Training Deep Neural Networks
Chengxi Ye
Xiong Zhou
Tristan McKinney
Yanfeng Liu
Qinggang Zhou
Fedor Zhdanov
38
4
0
30 Mar 2021
Policy Information Capacity: Information-Theoretic Measure for Task
  Complexity in Deep Reinforcement Learning
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning
Hiroki Furuta
T. Matsushima
Tadashi Kozuno
Y. Matsuo
Sergey Levine
Ofir Nachum
S. Gu
OffRL
58
14
0
23 Mar 2021
Demystifying the Effects of Non-Independence in Federated Learning
Demystifying the Effects of Non-Independence in Federated Learning
Stefan Arnold
Dilara Yesilbas
FedML
43
4
0
20 Mar 2021
Why flatness does and does not correlate with generalization for deep
  neural networks
Why flatness does and does not correlate with generalization for deep neural networks
Shuo Zhang
Isaac Reid
Guillermo Valle Pérez
A. Louis
77
8
0
10 Mar 2021
Physical Activity Recognition Based on a Parallel Approach for an
  Ensemble of Machine Learning and Deep Learning Classifiers
Physical Activity Recognition Based on a Parallel Approach for an Ensemble of Machine Learning and Deep Learning Classifiers
Mariem Abid
Amal Khabou
Y. Ouakrim
Hugo Watel
Safouene Chemcki
A. Mitiche
Amel Benazza-Benyahia
N. Mezghani
49
6
0
02 Mar 2021
Acceleration via Fractal Learning Rate Schedules
Acceleration via Fractal Learning Rate Schedules
Naman Agarwal
Surbhi Goel
Cyril Zhang
78
18
0
01 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
111
47
0
28 Feb 2021
Scalable federated machine learning with FEDn
Scalable federated machine learning with FEDn
Morgan Ekmefjord
Addi Ait-Mlouk
Sadi Alawadi
Mattias Åkesson
Desislava Stoyanova
O. Spjuth
Salman Toor
Andreas Hellander
FedML
80
43
0
27 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
104
80
0
24 Feb 2021
Topological Obstructions to Autoencoding
Topological Obstructions to Autoencoding
Joshua D. Batson
C. G. Haaf
Yonatan Kahn
Daniel A. Roberts
AI4CE
93
37
0
16 Feb 2021
Identifying Misinformation from Website Screenshots
Identifying Misinformation from Website Screenshots
S. Abdali
Rutuja Gurav
S. Menon
Daniel Fonseca
Negin Entezari
Neil Shah
Evangelos E. Papalexakis
100
13
0
15 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup
  Workers
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Guojun Xiong
Gang Yan
Rahul Singh
Jian Li
65
13
0
11 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
86
24
0
09 Feb 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
67
205
0
28 Jan 2021
To Talk or to Work: Flexible Communication Compression for Energy
  Efficient Federated Learning over Heterogeneous Mobile Edge Devices
To Talk or to Work: Flexible Communication Compression for Energy Efficient Federated Learning over Heterogeneous Mobile Edge Devices
Liang Li
Dian Shi
Ronghui Hou
Hui Li
Miao Pan
Zhu Han
FedML
70
152
0
22 Dec 2020
Regularization in network optimization via trimmed stochastic gradient
  descent with noisy label
Regularization in network optimization via trimmed stochastic gradient descent with noisy label
Kensuke Nakamura
Bong-Soo Sohn
Kyoung-Jae Won
Byung-Woo Hong
NoLa
60
0
0
21 Dec 2020
Recent advances in deep learning theory
Recent advances in deep learning theory
Fengxiang He
Dacheng Tao
AI4CE
132
51
0
20 Dec 2020
Data optimization for large batch distributed training of deep neural
  networks
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
23
1
0
16 Dec 2020
Study on the Large Batch Size Training of Neural Networks Based on the
  Second Order Gradient
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli Gao
Huicai Zhong
ODL
37
10
0
16 Dec 2020
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network
  Training
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training
Federico Zocco
Seán F. McLoone
ODL
55
5
0
14 Dec 2020
Warm Starting CMA-ES for Hyperparameter Optimization
Warm Starting CMA-ES for Hyperparameter Optimization
Masahiro Nomura
Shuhei Watanabe
Youhei Akimoto
Yoshihiko Ozaki
Masaki Onishi
95
43
0
13 Dec 2020
How to Train PointGoal Navigation Agents on a (Sample and Compute)
  Budget
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
Erik Wijmans
Irfan Essa
Dhruv Batra
3DPC
85
10
0
11 Dec 2020
Towards constraining warm dark matter with stellar streams through
  neural simulation-based inference
Towards constraining warm dark matter with stellar streams through neural simulation-based inference
Joeri Hermans
N. Banik
Christoph Weniger
G. Bertone
Gilles Louppe
93
31
0
30 Nov 2020
On Generalization of Adaptive Methods for Over-parameterized Linear
  Regression
On Generalization of Adaptive Methods for Over-parameterized Linear Regression
Vatsal Shah
Soumya Basu
Anastasios Kyrillidis
Sujay Sanghavi
AI4CE
61
4
0
28 Nov 2020
Cross-Camera Convolutional Color Constancy
Cross-Camera Convolutional Color Constancy
Mahmoud Afifi
Jonathan T. Barron
Chloe LeGendre
Yun-Ta Tsai
Francois Bleibel
60
44
0
24 Nov 2020
Long Short Term Memory Networks for Bandwidth Forecasting in Mobile
  Broadband Networks under Mobility
Long Short Term Memory Networks for Bandwidth Forecasting in Mobile Broadband Networks under Mobility
Konstantinos Kousias
A. Pappas
Özgü Alay
A. Argyriou
Michael Riegler
40
1
0
20 Nov 2020
On tuning deep learning models: a data mining perspective
On tuning deep learning models: a data mining perspective
M. Öztürk
28
0
0
19 Nov 2020
Contrastive Weight Regularization for Large Minibatch SGD
Contrastive Weight Regularization for Large Minibatch SGD
Qiwei Yuan
Weizhe Hua
Yi Zhou
Cunxi Yu
OffRL
86
1
0
17 Nov 2020
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet
Lorenzo Valerio
F. M. Nardini
A. Passarella
R. Perego
52
12
0
17 Nov 2020
Adaptive Federated Dropout: Improving Communication Efficiency and
  Generalization for Federated Learning
Adaptive Federated Dropout: Improving Communication Efficiency and Generalization for Federated Learning
Nader Bouacida
Jiahui Hou
H. Zang
Xin Liu
FedML
100
78
0
08 Nov 2020
Reverse engineering learned optimizers reveals known and novel
  mechanisms
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
101
22
0
04 Nov 2020
Accordion: Adaptive Gradient Communication via Critical Learning Regime
  Identification
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
85
25
0
29 Oct 2020
Stochastic Optimization with Laggard Data Pipelines
Stochastic Optimization with Laggard Data Pipelines
Naman Agarwal
Rohan Anil
Tomer Koren
Kunal Talwar
Cyril Zhang
35
12
0
26 Oct 2020
Previous
123456...8910
Next