Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00489
Cited By
v1
v2 (latest)
Don't Decay the Learning Rate, Increase the Batch Size
1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Don't Decay the Learning Rate, Increase the Batch Size"
50 / 454 papers shown
Title
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Zhengchun Liu
R. Kettimuthu
M. Papka
Ian Foster
48
3
0
22 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
Shuaiwen Leon Song
Sara Hooker
87
78
0
22 Jun 2021
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
95
161
0
17 Jun 2021
An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models
Xueqing Liu
Chi Wang
65
19
0
17 Jun 2021
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
79
114
0
15 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen
Kshitiz Malik
Hongyuan Zhan
Ashkan Yousefpour
Michael G. Rabbat
Mani Malek
Dzmitry Huba
FedML
101
316
0
11 Jun 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
145
120
0
11 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
146
8
0
04 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
87
13
0
01 Jun 2021
Maximizing Parallelism in Distributed Training for Huge Neural Networks
Zhengda Bian
Qifan Xu
Boxiang Wang
Yang You
MoE
63
48
0
30 May 2021
Rethinking "Batch" in BatchNorm
Yuxin Wu
Justin Johnson
BDL
125
66
0
17 May 2021
DoS and DDoS Mitigation Using Variational Autoencoders
Eirik Molde Bårli
Anis Yazidi
E. Herrera-Viedma
H. Haugerud
AAML
DRL
36
16
0
14 May 2021
Deep Neural Network as an alternative to Boosted Decision Trees for PID
Denis Stanev
Riccardo Riva
Michele Umassi
PINN
47
1
0
28 Apr 2021
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning
Shijian Li
Oren Mangoubi
Lijie Xu
Tian Guo
101
15
0
16 Apr 2021
NeuSE: A Neural Snapshot Ensemble Method for Collaborative Filtering
Dongsheng Li
Haodong Liu
Chao Chen
Yingying Zhao
Stephen M. Chu
Bo Yang
FedML
44
5
0
15 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
123
30
0
31 Mar 2021
Empirically explaining SGD from a line search perspective
Max Mutschler
A. Zell
ODL
LRM
71
4
0
31 Mar 2021
Exploiting Invariance in Training Deep Neural Networks
Chengxi Ye
Xiong Zhou
Tristan McKinney
Yanfeng Liu
Qinggang Zhou
Fedor Zhdanov
38
4
0
30 Mar 2021
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning
Hiroki Furuta
T. Matsushima
Tadashi Kozuno
Y. Matsuo
Sergey Levine
Ofir Nachum
S. Gu
OffRL
58
14
0
23 Mar 2021
Demystifying the Effects of Non-Independence in Federated Learning
Stefan Arnold
Dilara Yesilbas
FedML
43
4
0
20 Mar 2021
Why flatness does and does not correlate with generalization for deep neural networks
Shuo Zhang
Isaac Reid
Guillermo Valle Pérez
A. Louis
77
8
0
10 Mar 2021
Physical Activity Recognition Based on a Parallel Approach for an Ensemble of Machine Learning and Deep Learning Classifiers
Mariem Abid
Amal Khabou
Y. Ouakrim
Hugo Watel
Safouene Chemcki
A. Mitiche
Amel Benazza-Benyahia
N. Mezghani
49
6
0
02 Mar 2021
Acceleration via Fractal Learning Rate Schedules
Naman Agarwal
Surbhi Goel
Cyril Zhang
78
18
0
01 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
111
47
0
28 Feb 2021
Scalable federated machine learning with FEDn
Morgan Ekmefjord
Addi Ait-Mlouk
Sadi Alawadi
Mattias Åkesson
Desislava Stoyanova
O. Spjuth
Salman Toor
Andreas Hellander
FedML
80
43
0
27 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
104
80
0
24 Feb 2021
Topological Obstructions to Autoencoding
Joshua D. Batson
C. G. Haaf
Yonatan Kahn
Daniel A. Roberts
AI4CE
93
37
0
16 Feb 2021
Identifying Misinformation from Website Screenshots
S. Abdali
Rutuja Gurav
S. Menon
Daniel Fonseca
Negin Entezari
Neil Shah
Evangelos E. Papalexakis
100
13
0
15 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Guojun Xiong
Gang Yan
Rahul Singh
Jian Li
65
13
0
11 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
86
24
0
09 Feb 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
67
205
0
28 Jan 2021
To Talk or to Work: Flexible Communication Compression for Energy Efficient Federated Learning over Heterogeneous Mobile Edge Devices
Liang Li
Dian Shi
Ronghui Hou
Hui Li
Miao Pan
Zhu Han
FedML
70
152
0
22 Dec 2020
Regularization in network optimization via trimmed stochastic gradient descent with noisy label
Kensuke Nakamura
Bong-Soo Sohn
Kyoung-Jae Won
Byung-Woo Hong
NoLa
60
0
0
21 Dec 2020
Recent advances in deep learning theory
Fengxiang He
Dacheng Tao
AI4CE
132
51
0
20 Dec 2020
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
23
1
0
16 Dec 2020
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli Gao
Huicai Zhong
ODL
37
10
0
16 Dec 2020
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training
Federico Zocco
Seán F. McLoone
ODL
55
5
0
14 Dec 2020
Warm Starting CMA-ES for Hyperparameter Optimization
Masahiro Nomura
Shuhei Watanabe
Youhei Akimoto
Yoshihiko Ozaki
Masaki Onishi
95
43
0
13 Dec 2020
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
Erik Wijmans
Irfan Essa
Dhruv Batra
3DPC
85
10
0
11 Dec 2020
Towards constraining warm dark matter with stellar streams through neural simulation-based inference
Joeri Hermans
N. Banik
Christoph Weniger
G. Bertone
Gilles Louppe
93
31
0
30 Nov 2020
On Generalization of Adaptive Methods for Over-parameterized Linear Regression
Vatsal Shah
Soumya Basu
Anastasios Kyrillidis
Sujay Sanghavi
AI4CE
61
4
0
28 Nov 2020
Cross-Camera Convolutional Color Constancy
Mahmoud Afifi
Jonathan T. Barron
Chloe LeGendre
Yun-Ta Tsai
Francois Bleibel
60
44
0
24 Nov 2020
Long Short Term Memory Networks for Bandwidth Forecasting in Mobile Broadband Networks under Mobility
Konstantinos Kousias
A. Pappas
Özgü Alay
A. Argyriou
Michael Riegler
40
1
0
20 Nov 2020
On tuning deep learning models: a data mining perspective
M. Öztürk
28
0
0
19 Nov 2020
Contrastive Weight Regularization for Large Minibatch SGD
Qiwei Yuan
Weizhe Hua
Yi Zhou
Cunxi Yu
OffRL
86
1
0
17 Nov 2020
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet
Lorenzo Valerio
F. M. Nardini
A. Passarella
R. Perego
52
12
0
17 Nov 2020
Adaptive Federated Dropout: Improving Communication Efficiency and Generalization for Federated Learning
Nader Bouacida
Jiahui Hou
H. Zang
Xin Liu
FedML
100
78
0
08 Nov 2020
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
101
22
0
04 Nov 2020
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
85
25
0
29 Oct 2020
Stochastic Optimization with Laggard Data Pipelines
Naman Agarwal
Rohan Anil
Tomer Koren
Kunal Talwar
Cyril Zhang
35
12
0
26 Oct 2020
Previous
1
2
3
4
5
6
...
8
9
10
Next