ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size
v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXiv (abs)PDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown
Title
Large Scale Structure of Neural Network Loss Landscapes
Large Scale Structure of Neural Network Loss Landscapes
Stanislav Fort
Stanislaw Jastrzebski
74
84
0
11 Jun 2019
Federated Learning for Emoji Prediction in a Mobile Keyboard
Federated Learning for Emoji Prediction in a Mobile Keyboard
Swaroop Indra Ramaswamy
Rajiv Mathews
Kanishka Rao
Franccoise Beaufays
FedML
73
313
0
11 Jun 2019
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Alham Fikri Aji
Kenneth Heafield
68
13
0
08 Jun 2019
Non-Differentiable Supervised Learning with Evolution Strategies and
  Hybrid Methods
Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods
Karel Lenc
Erich Elsen
Tom Schaul
Karen Simonyan
49
20
0
07 Jun 2019
How to Initialize your Network? Robust Initialization for WeightNorm &
  ResNets
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
Devansh Arpit
Victor Campos
Yoshua Bengio
83
59
0
05 Jun 2019
Training Neural Response Selection for Task-Oriented Dialogue Systems
Training Neural Response Selection for Task-Oriented Dialogue Systems
Matthew Henderson
Ivan Vulić
D. Gerz
I. Casanueva
Paweł Budzianowski
Sam Coope
Georgios P. Spithourakis
Tsung-Hsien Wen
N. Mrksic
Pei-hao Su
54
111
0
04 Jun 2019
An Empirical Study on Hyperparameters and their Interdependence for RL
  Generalization
An Empirical Study on Hyperparameters and their Interdependence for RL Generalization
Xingyou Song
Yilun Du
Jacob Jackson
AI4CE
43
8
0
02 Jun 2019
Improving Model Training by Periodic Sampling over Weight Distributions
Improving Model Training by Periodic Sampling over Weight Distributions
S. Tripathi
Jiayi Liu
Unmesh Kurup
Mohak Shah
Sauptik Dhar
35
0
0
14 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC
  Infrastructure for Brain Mapping
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedMLAI4CE
40
9
0
13 May 2019
On the Computation and Communication Complexity of Parallel SGD with
  Dynamic Batch Sizes for Stochastic Non-Convex Optimization
On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization
Hao Yu
Rong Jin
88
51
0
10 May 2019
The Effect of Network Width on Stochastic Gradient Descent and
  Generalization: an Empirical Study
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Daniel S. Park
Jascha Narain Sohl-Dickstein
Quoc V. Le
Samuel L. Smith
104
57
0
09 May 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the
  Limbo of Resources
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
50
23
0
26 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
351
1,001
0
01 Apr 2019
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7
  seconds
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds
Masafumi Yamazaki
Akihiko Kasagi
Akihiro Tabuchi
Takumi Honda
Masahiro Miwa
Naoto Fukumoto
Tsuguchika Tabaru
Atsushi Ike
Kohta Nakashima
57
88
0
29 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
58
24
0
14 Mar 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with
  Structured Covariance Noise
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
73
22
0
21 Feb 2019
Optimizing Network Performance for Distributed DNN Training on GPU
  Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Peng Sun
Wansen Feng
Ruobing Han
Shengen Yan
Yonggang Wen
AI4CE
100
70
0
19 Feb 2019
LocalNorm: Robust Image Classification through Dynamically Regularized
  Normalization
LocalNorm: Robust Image Classification through Dynamically Regularized Normalization
Bojian Yin
S. Schaafsma
Henk Corporaal
H. Scholte
S. Bohté
38
2
0
18 Feb 2019
Neural-encoding Human Experts' Domain Knowledge to Warm Start
  Reinforcement Learning
Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning
Andrew Silva
Matthew C. Gombolay
OffRL
74
20
0
15 Feb 2019
Towards Federated Learning at Scale: System Design
Towards Federated Learning at Scale: System Design
Keith Bonawitz
Hubert Eichner
W. Grieskamp
Dzmitry Huba
A. Ingerman
...
H. B. McMahan
Timon Van Overveldt
David Petrou
Daniel Ramage
Jason Roselander
FedML
139
2,685
0
04 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Haowei He
Gao Huang
Yang Yuan
ODLMLT
110
150
0
02 Feb 2019
TF-Replicator: Distributed Machine Learning for Researchers
TF-Replicator: Distributed Machine Learning for Researchers
P. Buchlovsky
David Budden
Dominik Grewe
Chris Jones
John Aslanides
...
Aidan Clark
Sergio Gomez Colmenarejo
Aedan Pope
Fabio Viola
Dan Belov
GNNOffRLAI4CE
81
20
0
01 Feb 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
132
76
0
27 Jan 2019
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model
  Reconfiguration
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
Sangkug Lym
Esha Choukse
Siavash Zangeneh
W. Wen
Sujay Sanghavi
M. Erez
CVBM
81
88
0
26 Jan 2019
Large-Batch Training for LSTM and Beyond
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
67
91
0
24 Jan 2019
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep
  Neural Networks
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Jinrong Guo
Wantao Liu
Wang Wang
Q. Lu
Songlin Hu
Jizhong Han
Ruixuan Li
64
9
0
21 Jan 2019
Adapting Convolutional Neural Networks for Geographical Domain Shift
Adapting Convolutional Neural Networks for Geographical Domain Shift
Pavel Ostyakov
Sergey I. Nikolenko
OOD
59
3
0
18 Jan 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural
  Networks
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
128
252
0
18 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Kai Zou
Paolo Costa
Peter R. Pietzuch
72
70
0
08 Jan 2019
An Empirical Model of Large-Batch Training
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
81
280
0
14 Dec 2018
On Batch Orthogonalization Layers
On Batch Orthogonalization Layers
J. Blanchette
R. Laganière
BDLOOD
25
1
0
07 Dec 2018
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN
  Training
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training
Saurabh N. Adya
Vinay Palakkode
Oncel Tuzel
41
4
0
07 Dec 2018
Parameter Re-Initialization through Cyclical Batch Size Schedules
Parameter Re-Initialization through Cyclical Batch Size Schedules
Norman Mu
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
70
8
0
04 Dec 2018
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
308
1,422
0
04 Dec 2018
Stochastic Training of Residual Networks: a Differential Equation
  Viewpoint
Stochastic Training of Residual Networks: a Differential Equation Viewpoint
Qi Sun
Yunzhe Tao
Q. Du
76
24
0
01 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic
  Gradient Descent
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
102
73
0
30 Nov 2018
LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks
LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks
Yihan Jiang
Hyeji Kim
Himanshu Asnani
Sreeram Kannan
Sewoong Oh
Pramod Viswanath
103
79
0
30 Nov 2018
A Machine-Learning Phase Classification Scheme for Anomaly Detection in
  Signals with Periodic Characteristics
A Machine-Learning Phase Classification Scheme for Anomaly Detection in Signals with Periodic Characteristics
Lia Ahrens
Julian Ahrens
Hans D. Schotten
32
9
0
29 Nov 2018
Large-Scale Distributed Second-Order Optimization Using
  Kronecker-Factored Approximate Curvature for Deep Convolutional Neural
  Networks
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
ODL
120
95
0
29 Nov 2018
Deep learning for pedestrians: backpropagation in CNNs
Deep learning for pedestrians: backpropagation in CNNs
L. Boué
3DVPINN
49
4
0
29 Nov 2018
Neural Sign Language Translation based on Human Keypoint Estimation
Neural Sign Language Translation based on Human Keypoint Estimation
Sang-Ki Ko
Chang Jo Kim
Hyedong Jung
Choongsang Cho
SLR
118
213
0
28 Nov 2018
Hydra: A Peer to Peer Distributed Training & Data Collection Framework
Hydra: A Peer to Peer Distributed Training & Data Collection Framework
Vaibhav Mathur
K. Chahal
OffRL
35
2
0
24 Nov 2018
Bayesian Cycle-Consistent Generative Adversarial Networks via
  Marginalizing Latent Sampling
Bayesian Cycle-Consistent Generative Adversarial Networks via Marginalizing Latent Sampling
Haoran You
Yu Cheng
Tianheng Cheng
Chunliang Li
Pan Zhou
GAN
48
3
0
19 Nov 2018
Minimum weight norm models do not always generalize well for over-parameterized problems
Vatsal Shah
Anastasios Kyrillidis
Sujay Sanghavi
107
21
0
16 Nov 2018
Image Classification at Supercomputer Scale
Image Classification at Supercomputer Scale
Chris Ying
Sameer Kumar
Dehao Chen
Tao Wang
Youlong Cheng
VLM
88
123
0
16 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
97
77
0
13 Nov 2018
Evaluating the Ability of LSTMs to Learn Context-Free Grammars
Evaluating the Ability of LSTMs to Learn Context-Free Grammars
Luzi Sennhauser
Robert C. Berwick
111
57
0
06 Nov 2018
Neural Likelihoods via Cumulative Distribution Functions
Neural Likelihoods via Cumulative Distribution Functions
Pawel M. Chilinski
Ricardo M. A. Silva
UQCV
85
43
0
02 Nov 2018
Machine Translation between Vietnamese and English: an Empirical Study
Machine Translation between Vietnamese and English: an Empirical Study
Hong-Hai Phan-Vu
Viet-Trung Tran
V. Nguyen
Hoang-Vu Dang
Phan-Thuan Do
55
17
0
30 Oct 2018
A Closer Look at Deep Learning Heuristics: Learning rate restarts,
  Warmup and Distillation
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
124
277
0
29 Oct 2018
Previous
123...10789
Next