Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00489
Cited By
v1
v2 (latest)
Don't Decay the Learning Rate, Increase the Batch Size
1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Don't Decay the Learning Rate, Increase the Batch Size"
50 / 454 papers shown
Title
How Data Augmentation affects Optimization for Linear Regression
Boris Hanin
Yi Sun
86
16
0
21 Oct 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
90
221
0
14 Oct 2020
WHO 2016 subtyping and automated segmentation of glioma using multi-task deep learning
S. V. D. Voort
Fatih Incekara
M. Wijnenga
G. Kapsas
R. Gahrmann
...
A. Vincent
W. Niessen
M. Bent
M. Smits
S. Klein
37
7
0
09 Oct 2020
Genetic-algorithm-optimized neural networks for gravitational wave classification
Dwyer Deighan
Scott E. Field
C. Capano
G. Khanna
57
22
0
09 Oct 2020
COVID-19 Classification Using Staked Ensembles: A Comprehensive Analysis
B. LalithBharadwaj
Rohit Boddeda
K. Vardhan
G. Madhu
28
1
0
07 Oct 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
112
75
0
06 Oct 2020
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
50
3
0
28 Sep 2020
Improved Modeling of 3D Shapes with Multi-view Depth Maps
Kamal Gupta
Susmija Jabbireddy
Ketul Shah
Abhinav Shrivastava
Matthias Zwicker
3DV
43
5
0
07 Sep 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima
Wonyong Sung
Iksoo Choi
Jinhwan Park
Seokhyun Choi
Sungho Shin
ODL
58
7
0
05 Sep 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
79
183
0
27 Aug 2020
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Benjamin Kurt Miller
Mario Geiger
Tess E. Smidt
Frank Noé
114
78
0
19 Aug 2020
A Survey on Large-scale Machine Learning
Meng Wang
Weijie Fu
Xiangnan He
Shijie Hao
Xindong Wu
84
112
0
10 Aug 2020
Linear discriminant initialization for feed-forward neural networks
Marissa Masden
D. Sinha
FedML
40
3
0
24 Jul 2020
On stochastic mirror descent with interacting particles: convergence properties and variance reduction
Anastasia Borovykh
N. Kantas
P. Parpas
G. Pavliotis
55
12
0
15 Jul 2020
Analyzing and Mitigating Data Stalls in DNN Training
Jayashree Mohan
Amar Phanishayee
Ashish Raniwala
Vijay Chidambaram
86
110
0
14 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng Jiang
G. Agrawal
54
5
0
13 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
90
37
0
09 Jul 2020
Coded Distributed Computing with Partial Recovery
Emre Ozfatura
S. Ulukus
Deniz Gunduz
71
29
0
04 Jul 2020
Variance reduction for Riemannian non-convex optimization with batch size adaptation
Andi Han
Junbin Gao
85
5
0
03 Jul 2020
Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms
D. Kafka
D. Wilke
ODL
43
0
0
29 Jun 2020
Is SGD a Bayesian sampler? Well, almost
Chris Mingard
Guillermo Valle Pérez
Joar Skalse
A. Louis
BDL
83
53
0
26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
62
100
0
26 Jun 2020
Effective Elastic Scaling of Deep Learning Workloads
Vaibhav Saxena
K.R. Jayaram
Saurav Basu
Yogish Sabharwal
Ashish Verma
57
9
0
24 Jun 2020
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees
Ahnjae Shin
Do Yoon Kim
Joo Seong Jeong
Byung-Gon Chun
52
4
0
22 Jun 2020
How do SGD hyperparameters in natural training affect adversarial robustness?
Sandesh Kamath
Amit Deshpande
K. Subrahmanyam
AAML
44
3
0
20 Jun 2020
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives
Qi Qi
Zhishuai Guo
Yi Tian Xu
Rong Jin
Tianbao Yang
117
47
0
17 Jun 2020
Fine-Grained Stochastic Architecture Search
S. Chaudhuri
Elad Eban
Hanhan Li
Max Moroz
Yair Movshovitz-Attias
40
8
0
17 Jun 2020
Gradient Amplification: An efficient way to train deep neural networks
S. Basodi
Chunyan Ji
Haiping Zhang
Yi Pan
ODL
55
116
0
16 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
148
50
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
219
95
0
15 Jun 2020
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
81
228
0
12 Jun 2020
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Michael T. McCann
S. Ravishankar
47
8
0
09 Jun 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
64
21
0
15 May 2020
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training
Yemao Xu
Dezun Dong
Weixia Xu
Xiangke Liao
47
7
0
14 May 2020
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
Hongfei Xu
Josef van Genabith
Deyi Xiong
Qiuhui Liu
47
11
0
05 May 2020
Adaptive Learning of the Optimal Batch Size of SGD
Motasem Alfarra
Slavomir Hanzely
Alyazeed Albasyoni
Guohao Li
Peter Richtárik
48
5
0
03 May 2020
Dynamic backup workers for parallel machine learning
Chuan Xu
Giovanni Neglia
Nicola Sebastianelli
72
11
0
30 Apr 2020
DIET: Lightweight Language Understanding for Dialogue Systems
Tanja Bunk
Daksh Varshneya
Vladimir Vlasov
Alan Nichol
74
162
0
21 Apr 2020
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
97
61
0
15 Apr 2020
Stochastic batch size for adaptive regularization in deep network optimization
Kensuke Nakamura
Stefano Soatto
Byung-Woo Hong
ODL
51
6
0
14 Apr 2020
Understanding Learning Dynamics for Neural Machine Translation
Conghui Zhu
Guanlin Li
Lemao Liu
Tiejun Zhao
Shuming Shi
50
3
0
05 Apr 2020
Predicting the outputs of finite deep neural networks trained with noisy gradients
Gadi Naveh
Oded Ben-David
H. Sompolinsky
Zohar Ringel
116
23
0
02 Apr 2020
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
Martin Jaggi
52
2
0
25 Mar 2020
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali
Yan Sun
Robert Tibshirani
103
77
0
17 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
128
12
0
06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
237
241
0
04 Mar 2020
Stagewise Enlargement of Batch Size for SGD-based Learning
Shen-Yi Zhao
Yin-Peng Xie
Wu-Jun Li
50
1
0
26 Feb 2020
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers
Serge Kas Hanna
Rawad Bitar
Parimal Parag
Venkateswara Dasari
S. E. Rouayheb
69
16
0
25 Feb 2020
Baryon acoustic oscillations reconstruction using convolutional neural networks
Tianxiang Mao
Jie-Shuang Wang
Baojiu Li
Yan-Chuan Cai
B. Falck
M. Neyrinck
A. Szalay
61
13
0
24 Feb 2020
Previous
1
2
3
...
10
5
6
7
8
9
Next