ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXivPDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 168 papers shown
Title
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient
  Direction Change
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
Hongfei Xu
Josef van Genabith
Deyi Xiong
Qiuhui Liu
14
10
0
05 May 2020
DIET: Lightweight Language Understanding for Dialogue Systems
DIET: Lightweight Language Understanding for Dialogue Systems
Tanja Bunk
Daksh Varshneya
Vladimir Vlasov
Alan Nichol
27
160
0
21 Apr 2020
On Learning Rates and Schrödinger Operators
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
14
60
0
15 Apr 2020
Predicting the outputs of finite deep neural networks trained with noisy
  gradients
Predicting the outputs of finite deep neural networks trained with noisy gradients
Gadi Naveh
Oded Ben-David
H. Sompolinsky
Zohar Ringel
19
20
0
02 Apr 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
30
12
0
06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
235
0
04 Mar 2020
Stable Training of DNN for Speech Enhancement based on
  Perceptually-Motivated Black-Box Cost Function
Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function
M. Kawanaka
Yuma Koizumi
Ryoichi Miyazaki
Kohei Yatabe
AAML
27
22
0
14 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
28
17
0
10 Feb 2020
Data-Driven Permanent Magnet Temperature Estimation in Synchronous
  Motors with Supervised Machine Learning
Data-Driven Permanent Magnet Temperature Estimation in Synchronous Motors with Supervised Machine Learning
Wilhelm Kirchgässner
Oliver Wallscheid
J. Böcker
23
68
0
17 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that
  Generalizes Well
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
38
55
0
07 Jan 2020
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
25
168
0
19 Dec 2019
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with
  Adaptive Solvers
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers
T. Nguyen
Animesh Garg
Richard G. Baraniuk
Anima Anandkumar
TPM
28
9
0
09 Dec 2019
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
20
312
0
04 Dec 2019
A Multigrid Method for Efficiently Training Video Models
A Multigrid Method for Efficiently Training Video Models
Chaoxia Wu
Ross B. Girshick
Kaiming He
Christoph Feichtenhofer
Philipp Krahenbuhl
21
94
0
02 Dec 2019
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
13
621
0
13 Nov 2019
Turbo Autoencoder: Deep learning based channel codes for point-to-point
  communication channels
Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels
Yihan Jiang
Hyeji Kim
Himanshu Asnani
Sreeram Kannan
Sewoong Oh
Pramod Viswanath
30
134
0
08 Nov 2019
Small-GAN: Speeding Up GAN Training Using Core-sets
Small-GAN: Speeding Up GAN Training Using Core-sets
Samarth Sinha
Hang Zhang
Anirudh Goyal
Yoshua Bengio
Hugo Larochelle
Augustus Odena
GAN
38
72
0
29 Oct 2019
Demon: Improved Neural Network Training with Momentum Decay
Demon: Improved Neural Network Training with Momentum Decay
John Chen
Cameron R. Wolfe
Zhaoqi Li
Anastasios Kyrillidis
ODL
24
15
0
11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
59
70
0
09 Oct 2019
Stochastic gradient descent for hybrid quantum-classical optimization
Stochastic gradient descent for hybrid quantum-classical optimization
R. Sweke
Frederik Wilde
Johannes Jakob Meyer
Maria Schuld
Paul K. Fährmann
Barthélémy Meynard-Piganeau
Jens Eisert
17
236
0
02 Oct 2019
Mix & Match: training convnets with mixed image sizes for improved
  accuracy, speed and scale resiliency
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
Elad Hoffer
Berry Weinstein
Itay Hubara
Tal Ben-Nun
Torsten Hoefler
Daniel Soudry
29
20
0
12 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning
  Training
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
34
55
0
30 Jul 2019
Adaptive Regularization via Residual Smoothing in Deep Learning
  Optimization
Adaptive Regularization via Residual Smoothing in Deep Learning Optimization
Jung-Kyun Cho
Junseok Kwon
Byung-Woo Hong
31
1
0
23 Jul 2019
The University of Edinburgh's Submissions to the WMT19 News Translation
  Task
The University of Edinburgh's Submissions to the WMT19 News Translation Task
Rachel Bawden
Nikolay Bogoychev
Ulrich Germann
Roman Grundkiewicz
Faheem Kirefu
Antonio Valerio Miceli Barone
Alexandra Birch
22
32
0
12 Jul 2019
The Adversarial Robustness of Sampling
The Adversarial Robustness of Sampling
Omri Ben-Eliezer
E. Yogev
TTA
AAML
26
45
0
26 Jun 2019
Federated Learning for Emoji Prediction in a Mobile Keyboard
Federated Learning for Emoji Prediction in a Mobile Keyboard
Swaroop Indra Ramaswamy
Rajiv Mathews
Kanishka Rao
Franccoise Beaufays
FedML
21
309
0
11 Jun 2019
An Empirical Study on Hyperparameters and their Interdependence for RL
  Generalization
An Empirical Study on Hyperparameters and their Interdependence for RL Generalization
Xingyou Song
Yilun Du
Jacob Jackson
AI4CE
24
8
0
02 Jun 2019
Scaling Distributed Training of Flood-Filling Networks on HPC
  Infrastructure for Brain Mapping
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedML
AI4CE
19
9
0
13 May 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the
  Limbo of Resources
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
25
23
0
26 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
980
0
01 Apr 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with
  Structured Covariance Noise
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
20
22
0
21 Feb 2019
Neural-encoding Human Experts' Domain Knowledge to Warm Start
  Reinforcement Learning
Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning
Andrew Silva
Matthew C. Gombolay
OffRL
27
20
0
15 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Haowei He
Gao Huang
Yang Yuan
ODL
MLT
28
147
0
02 Feb 2019
TF-Replicator: Distributed Machine Learning for Researchers
TF-Replicator: Distributed Machine Learning for Researchers
P. Buchlovsky
David Budden
Dominik Grewe
Chris Jones
John Aslanides
...
Aidan Clark
Sergio Gomez Colmenarejo
Aedan Pope
Fabio Viola
Dan Belov
GNN
OffRL
AI4CE
37
20
0
01 Feb 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
30
72
0
27 Jan 2019
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep
  Neural Networks
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Jinrong Guo
Wantao Liu
Wang Wang
Q. Lu
Songlin Hu
Jizhong Han
Ruixuan Li
16
9
0
21 Jan 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural
  Networks
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
26
237
0
18 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Luo Mai
Paolo Costa
Peter R. Pietzuch
19
69
0
08 Jan 2019
Parameter Re-Initialization through Cyclical Batch Size Schedules
Parameter Re-Initialization through Cyclical Batch Size Schedules
Norman Mu
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
30
8
0
04 Dec 2018
LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks
LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks
Yihan Jiang
Hyeji Kim
Himanshu Asnani
Sreeram Kannan
Sewoong Oh
Pramod Viswanath
38
79
0
30 Nov 2018
Neural Sign Language Translation based on Human Keypoint Estimation
Neural Sign Language Translation based on Human Keypoint Estimation
Sang-Ki Ko
Chang Jo Kim
Hyedong Jung
Choongsang Cho
SLR
30
207
0
28 Nov 2018
Bayesian Cycle-Consistent Generative Adversarial Networks via
  Marginalizing Latent Sampling
Bayesian Cycle-Consistent Generative Adversarial Networks via Marginalizing Latent Sampling
Haoran You
Yu Cheng
Tianheng Cheng
Chunliang Li
Pan Zhou
GAN
29
3
0
19 Nov 2018
Image Classification at Supercomputer Scale
Image Classification at Supercomputer Scale
Chris Ying
Sameer Kumar
Dehao Chen
Tao Wang
Youlong Cheng
VLM
11
122
0
16 Nov 2018
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
K. Chahal
Manraj Singh Grover
Kuntal Dey
3DH
OOD
6
53
0
28 Oct 2018
Applying Deep Learning To Airbnb Search
Applying Deep Learning To Airbnb Search
Malay Haldar
Mustafa Abdool
Prashant Ramanathan
Tao Xu
Shulin Yang
...
Qing Zhang
Nick Barrow-Williams
B. Turnbull
Brendan M. Collins
Thomas Legrand
DML
23
83
0
22 Oct 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime
  Trade-off in Local-Update SGD
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
33
231
0
19 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from
  Random Matrix Theory and Implications for Learning
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
44
191
0
02 Oct 2018
Normalization in Training U-Net for 2D Biomedical Semantic Segmentation
Normalization in Training U-Net for 2D Biomedical Semantic Segmentation
Xiao-Yun Zhou
Guang-Zhong Yang
18
77
0
11 Sep 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Perturbative Neural Networks
Perturbative Neural Networks
Felix Juefei Xu
Vishnu Boddeti
Marios Savvides
24
37
0
05 Jun 2018
Previous
1234
Next