Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00489
Cited By
v1
v2 (latest)
Don't Decay the Learning Rate, Increase the Batch Size
1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Don't Decay the Learning Rate, Increase the Batch Size"
50 / 454 papers shown
Title
Investigating the interaction between gradient-only line searches and different activation functions
D. Kafka
D. Wilke
48
0
0
23 Feb 2020
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning
Karsten Roth
Timo Milbich
Samarth Sinha
Prateek Gupta
Bjorn Ommer
Joseph Paul Cohen
187
173
0
19 Feb 2020
Rethinking the Hyperparameters for Fine-tuning
Hao Li
Pratik Chaudhari
Hao Yang
Michael Lam
Avinash Ravichandran
Rahul Bhotika
Stefano Soatto
VLM
93
130
0
19 Feb 2020
Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function
M. Kawanaka
Yuma Koizumi
Ryoichi Miyazaki
Kohei Yatabe
AAML
70
23
0
14 Feb 2020
Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise
Umut Simsekli
Lingjiong Zhu
Yee Whye Teh
Mert Gurbuzbalaban
92
50
0
13 Feb 2020
Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Chuan-Sheng Foo
Rio Yokota
90
37
0
13 Feb 2020
Black-Box Optimization with Local Generative Surrogates
S. Shirobokov
V. Belavin
Michael Kagan
Andrey Ustyuzhanin
A. G. Baydin
60
3
0
11 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
127
17
0
10 Feb 2020
Depthwise-STFT based separable Convolutional Neural Networks
Sudhakar Kumawat
Shanmuganathan Raman
OOD
MDE
50
5
0
27 Jan 2020
Variance Reduction with Sparse Gradients
Melih Elibol
Lihua Lei
Michael I. Jordan
67
23
0
27 Jan 2020
Data-Driven Permanent Magnet Temperature Estimation in Synchronous Motors with Supervised Machine Learning
Wilhelm Kirchgässner
Oliver Wallscheid
J. Böcker
41
70
0
17 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
88
60
0
07 Jan 2020
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
Konpat Preechakul
B. Kijsirikul
ODL
40
3
0
24 Dec 2019
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
137
169
0
19 Dec 2019
On the Bias-Variance Tradeoff: Textbooks Need an Update
Brady Neal
43
18
0
17 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
201
630
0
11 Dec 2019
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers
T. Nguyen
Animesh Garg
Richard G. Baraniuk
Anima Anandkumar
TPM
113
9
0
09 Dec 2019
Observational Overfitting in Reinforcement Learning
Xingyou Song
Yiding Jiang
Stephen Tu
Yilun Du
Behnam Neyshabur
OffRL
134
140
0
06 Dec 2019
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
142
332
0
04 Dec 2019
A Multigrid Method for Efficiently Training Video Models
Chaoxia Wu
Ross B. Girshick
Kaiming He
Christoph Feichtenhofer
Philipp Krahenbuhl
95
94
0
02 Dec 2019
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut Simsekli
Mert Gurbuzbalaban
T. H. Nguyen
G. Richard
Levent Sagun
88
59
0
29 Nov 2019
Stage-based Hyper-parameter Optimization for Deep Learning
Ahnjae Shin
Dongjin Shin
Sungwoo Cho
Do Yoon Kim
Eunji Jeong
Gyeong-In Yu
Byung-Gon Chun
31
4
0
24 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
110
656
0
13 Nov 2019
Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels
Yihan Jiang
Hyeji Kim
Himanshu Asnani
Sreeram Kannan
Sewoong Oh
Pramod Viswanath
69
138
0
08 Nov 2019
Small-GAN: Speeding Up GAN Training Using Core-sets
Samarth Sinha
Hang Zhang
Anirudh Goyal
Yoshua Bengio
Hugo Larochelle
Augustus Odena
GAN
99
77
0
29 Oct 2019
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Koyel Mukherjee
Alind Khare
Ashish Verma
76
15
0
25 Oct 2019
Fast Exact Matrix Completion: A Unified Optimization Framework for Matrix Completion
Dimitris Bertsimas
M. Li
67
2
0
21 Oct 2019
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
Matteo Sordello
Niccolò Dalmasso
Hangfeng He
Weijie Su
68
7
0
18 Oct 2019
Improving the convergence of SGD through adaptive batch sizes
Scott Sievert
Zachary B. Charles
ODL
74
8
0
18 Oct 2019
Demon: Improved Neural Network Training with Momentum Decay
John Chen
Cameron R. Wolfe
Zhaoqi Li
Anastasios Kyrillidis
ODL
106
15
0
11 Oct 2019
Blink: Fast and Generic Collectives for Distributed ML
Guanhua Wang
Shivaram Venkataraman
Amar Phanishayee
J. Thelin
Nikhil R. Devanur
Ion Stoica
VLM
67
142
0
11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
106
70
0
09 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet Training
John Shelton Hyatt
Cameron R. Wolfe
Michael Lee
Yuxin Tang
Anastasios Kyrillidis
Christopher M. Jermaine
OOD
92
39
0
04 Oct 2019
SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low Overhead
A. Masullo
Ligang He
Toby Perrett
Rui Mao
Carsten Maple
Majid Mirmehdi
111
319
0
03 Oct 2019
Stochastic gradient descent for hybrid quantum-classical optimization
R. Sweke
Frederik Wilde
Johannes Jakob Meyer
Maria Schuld
Paul K. Fährmann
Barthélémy Meynard-Piganeau
Jens Eisert
105
241
0
02 Oct 2019
Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos
Ji Lin
Chuang Gan
Song Han
78
10
0
01 Oct 2019
Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs
Yuxian Meng
Xiangyuan Ren
Zijun Sun
Xiaoya Li
Arianna Yuan
Leilei Gan
Jiwei Li
AIMat
AI4CE
62
8
0
26 Sep 2019
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle
Michael Kaufmann
K. Kourtis
Celestine Mendler-Dünner
Adrian Schüpbach
Thomas Parnell
18
0
0
11 Sep 2019
Neural Architecture Search in Embedding Space
Chunmiao Liu
64
0
0
09 Sep 2019
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
Elad Hoffer
Berry Weinstein
Itay Hubara
Tal Ben-Nun
Torsten Hoefler
Daniel Soudry
115
20
0
12 Aug 2019
EdgeNet: Semantic Scene Completion from a Single RGB-D Image
Aloisio Dourado
Teofilo de Campos
Hansung Kim
A. Hilton
3DV
3DPC
67
18
0
08 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
92
59
0
30 Jul 2019
Adaptive Regularization via Residual Smoothing in Deep Learning Optimization
Jung-Kyun Cho
Junseok Kwon
Byung-Woo Hong
71
1
0
23 Jul 2019
Adaptive Weight Decay for Deep Neural Networks
Kensuke Nakamura
Byung-Woo Hong
63
43
0
21 Jul 2019
The University of Edinburgh's Submissions to the WMT19 News Translation Task
Rachel Bawden
Nikolay Bogoychev
Ulrich Germann
Roman Grundkiewicz
Faheem Kirefu
Antonio Valerio Miceli Barone
Alexandra Birch
59
32
0
12 Jul 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
93
300
0
10 Jul 2019
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Guodong Zhang
Lala Li
Zachary Nado
James Martens
Sushant Sachdeva
George E. Dahl
Christopher J. Shallue
Roger C. Grosse
128
154
0
09 Jul 2019
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale
A. G. Baydin
Lei Shao
W. Bhimji
Lukas Heinrich
Lawrence Meadows
...
Philip Torr
Victor W. Lee
Kyle Cranmer
P. Prabhat
Frank Wood
82
58
0
08 Jul 2019
EPNAS: Efficient Progressive Neural Architecture Search
Yanqi Zhou
Peng Wang
Sercan O. Arik
Haonan Yu
Syed Zawad
Feng Yan
G. Diamos
47
5
0
07 Jul 2019
The Adversarial Robustness of Sampling
Omri Ben-Eliezer
E. Yogev
TTA
AAML
63
48
0
26 Jun 2019
Previous
1
2
3
...
10
6
7
8
9
Next