ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size
v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXiv (abs)PDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown
Title
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
K. Chahal
Manraj Singh Grover
Kuntal Dey
3DHOOD
92
54
0
28 Oct 2018
Applying Deep Learning To Airbnb Search
Applying Deep Learning To Airbnb Search
Malay Haldar
Mustafa Abdool
Prashant Ramanathan
Tao Xu
Shulin Yang
...
Qing Zhang
Nick Barrow-Williams
B. Turnbull
Brendan M. Collins
Thomas Legrand
DML
80
86
0
22 Oct 2018
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
Brady Neal
Sarthak Mittal
A. Baratin
Vinayak Tantia
Matthew Scicluna
Simon Lacoste-Julien
Ioannis Mitliagkas
89
168
0
19 Oct 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime
  Trade-off in Local-Update SGD
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
117
232
0
19 Oct 2018
Approximate Fisher Information Matrix to Characterise the Training of
  Deep Neural Networks
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Zhibin Liao
Tom Drummond
Ian Reid
G. Carneiro
82
23
0
16 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from
  Random Matrix Theory and Implications for Learning
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
141
201
0
02 Oct 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
101
42
0
02 Oct 2018
Dynamic Sparse Graph for Efficient Deep Learning
Dynamic Sparse Graph for Efficient Deep Learning
Liu Liu
Lei Deng
Xing Hu
Maohua Zhu
Guoqi Li
Yufei Ding
Yuan Xie
GNN
90
42
0
01 Oct 2018
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher
  Distributions in Deep learning
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
68
8
0
29 Sep 2018
Discovering Low-Precision Networks Close to Full-Precision Networks for
  Efficient Embedded Inference
Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference
J. McKinstry
S. K. Esser
R. Appuswamy
Deepika Bablani
John V. Arthur
Izzet B. Yildiz
D. Modha
MQ
79
94
0
11 Sep 2018
Normalization in Training U-Net for 2D Biomedical Semantic Segmentation
Normalization in Training U-Net for 2D Biomedical Semantic Segmentation
Xiao-Yun Zhou
Guang-Zhong Yang
105
80
0
11 Sep 2018
Single-Microphone Speech Enhancement and Separation Using Deep Learning
Single-Microphone Speech Enhancement and Separation Using Deep Learning
Morten Kolbaek
58
7
0
31 Aug 2018
The University of Cambridge's Machine Translation Systems for WMT18
The University of Cambridge's Machine Translation Systems for WMT18
Felix Stahlberg
Adria de Gispert
Bill Byrne
56
20
0
28 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
133
432
0
22 Aug 2018
Fast, Better Training Trick -- Random Gradient
Fast, Better Training Trick -- Random Gradient
Jiakai Wei
ODL
23
2
0
13 Aug 2018
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Raul Puri
Robert M. Kirby
Nikolai Yakovenko
Bryan Catanzaro
79
29
0
03 Aug 2018
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
Xianyan Jia
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
...
Li Yu
Tiegang Chen
Guangxiao Hu
Shaoshuai Shi
Xiaowen Chu
115
385
0
30 Jul 2018
An argument in favor of strong scaling for deep neural networks with
  small datasets
An argument in favor of strong scaling for deep neural networks with small datasets
R. L. F. Cunha
Eduardo Rodrigues
Matheus Palhares Viana
Dario Augusto Borges Oliveira
72
2
0
24 Jul 2018
Trust-Region Algorithms for Training Responses: Machine Learning Methods
  Using Indefinite Hessian Approximations
Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations
Jennifer B. Erway
J. Griffin
Roummel F. Marcia
Riadh Omheni
64
24
0
01 Jul 2018
Stochastic natural gradient descent draws posterior samples in function
  space
Stochastic natural gradient descent draws posterior samples in function space
Samuel L. Smith
Daniel Duckworth
Semon Rezchikov
Quoc V. Le
Jascha Narain Sohl-Dickstein
BDL
85
6
0
25 Jun 2018
Pushing the boundaries of parallel Deep Learning -- A practical approach
Pushing the boundaries of parallel Deep Learning -- A practical approach
Paolo Viviani
M. Drocco
Marco Aldinucci
OOD
52
0
0
25 Jun 2018
Character-Level Feature Extraction with Densely Connected Networks
Character-Level Feature Extraction with Densely Connected Networks
Chanhee Lee
Young-Bum Kim
Dongyub Lee
Heuiseok Lim
3DV
43
12
0
24 Jun 2018
Kernel machines that adapt to GPUs for effective large batch training
Kernel machines that adapt to GPUs for effective large batch training
Siyuan Ma
M. Belkin
33
2
0
15 Jun 2018
Perturbative Neural Networks
Perturbative Neural Networks
Felix Juefei Xu
Vishnu Boddeti
Marios Savvides
70
38
0
05 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a
  Fixed Learning Rate
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel Nacson
Nathan Srebro
Daniel Soudry
FedMLMLT
102
102
0
05 Jun 2018
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance
  Benchmark
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
Cody Coleman
Daniel Kang
Deepak Narayanan
Luigi Nardi
Tian Zhao
Jian Zhang
Peter Bailis
K. Olukotun
Christopher Ré
Matei A. Zaharia
71
117
0
04 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
159
414
0
01 Jun 2018
Scaling Neural Machine Translation
Scaling Neural Machine Translation
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
AIMat
208
617
0
01 Jun 2018
Understanding Batch Normalization
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
266
620
0
01 Jun 2018
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Joeri Hermans
Gilles Louppe
53
5
0
22 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep
  Learning
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
79
52
0
21 May 2018
Multi-representation Ensembles and Delayed SGD Updates Improve
  Syntax-based NMT
Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT
Danielle Saunders
Felix Stahlberg
Adria de Gispert
Bill Byrne
97
25
0
01 May 2018
SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach
SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach
Michael Petrochuk
Luke Zettlemoyer
61
90
0
24 Apr 2018
BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First
  Parallelism
BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism
Nicolas Weber
F. Schmidt
Mathias Niepert
Felipe Huici
29
9
0
23 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
83
671
0
20 Apr 2018
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching
Yosuke Oyama
Tal Ben-Nun
Torsten Hoefler
Satoshi Matsuoka
31
1
0
13 Apr 2018
Training Tips for the Transformer Model
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
110
312
0
01 Apr 2018
A disciplined approach to neural network hyper-parameters: Part 1 --
  learning rate, batch size, momentum, and weight decay
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
L. Smith
317
1,037
0
26 Mar 2018
Norm matters: efficient and accurate normalization schemes in deep
  networks
Norm matters: efficient and accurate normalization schemes in deep networks
Elad Hoffer
Ron Banner
Itay Golan
Daniel Soudry
OffRL
92
179
0
05 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
87
713
0
26 Feb 2018
A Walk with SGD
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
102
119
0
24 Feb 2018
Computation of optimal transport and related hedging problems via
  penalization and neural networks
Computation of optimal transport and related hedging problems via penalization and neural networks
Stephan Eckstein
Michael Kupper
OT
80
50
0
23 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
126
413
0
22 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to
  Adversaries
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
108
168
0
22 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning
A Progressive Batching L-BFGS Method for Machine Learning
Raghu Bollapragada
Dheevatsa Mudigere
J. Nocedal
Hao-Jun Michael Shi
P. T. P. Tang
ODL
114
153
0
15 Feb 2018
On Characterizing the Capacity of Neural Networks using Algebraic
  Topology
On Characterizing the Capacity of Neural Networks using Algebraic Topology
William H. Guss
Ruslan Salakhutdinov
91
90
0
13 Feb 2018
On Scale-out Deep Learning Training for Cloud and HPC
On Scale-out Deep Learning Training for Cloud and HPC
Srinivas Sridharan
K. Vaidyanathan
Dhiraj D. Kalamkar
Dipankar Das
Mikhail E. Smorkalov
...
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
Pradeep Dubey
BDL
75
30
0
24 Jan 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
117
291
0
18 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
114
136
0
06 Dec 2017
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
S. Cho
Sunghun Kang
Chang D. Yoo
84
1
0
17 Nov 2017
Previous
123...1089
Next