ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size
v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXiv (abs)PDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown
Title
How Data Augmentation affects Optimization for Linear Regression
How Data Augmentation affects Optimization for Linear Regression
Boris Hanin
Yi Sun
86
16
0
21 Oct 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign
  Dropout
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
90
221
0
14 Oct 2020
WHO 2016 subtyping and automated segmentation of glioma using multi-task
  deep learning
WHO 2016 subtyping and automated segmentation of glioma using multi-task deep learning
S. V. D. Voort
Fatih Incekara
M. Wijnenga
G. Kapsas
R. Gahrmann
...
A. Vincent
W. Niessen
M. Bent
M. Smits
S. Klein
37
7
0
09 Oct 2020
Genetic-algorithm-optimized neural networks for gravitational wave
  classification
Genetic-algorithm-optimized neural networks for gravitational wave classification
Dwyer Deighan
Scott E. Field
C. Capano
G. Khanna
57
22
0
09 Oct 2020
COVID-19 Classification Using Staked Ensembles: A Comprehensive Analysis
B. LalithBharadwaj
Rohit Boddeda
K. Vardhan
G. Madhu
28
1
0
07 Oct 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses:
  The Intrinsic Learning Rate
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
112
75
0
06 Oct 2020
Improved generalization by noise enhancement
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
50
3
0
28 Sep 2020
Improved Modeling of 3D Shapes with Multi-view Depth Maps
Improved Modeling of 3D Shapes with Multi-view Depth Maps
Kamal Gupta
Susmija Jabbireddy
Ketul Shah
Abhinav Shrivastava
Matthias Zwicker
3DV
43
5
0
07 Sep 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise
  Injection for Reaching Flat Minima
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima
Wonyong Sung
Iksoo Choi
Jinhwan Park
Seokhyun Choi
Sungho Shin
ODL
58
7
0
05 Sep 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep
  Learning
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
79
183
0
27 Aug 2020
Relevance of Rotationally Equivariant Convolutions for Predicting
  Molecular Properties
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Benjamin Kurt Miller
Mario Geiger
Tess E. Smidt
Frank Noé
114
78
0
19 Aug 2020
A Survey on Large-scale Machine Learning
A Survey on Large-scale Machine Learning
Meng Wang
Weijie Fu
Xiangnan He
Shijie Hao
Xindong Wu
84
112
0
10 Aug 2020
Linear discriminant initialization for feed-forward neural networks
Linear discriminant initialization for feed-forward neural networks
Marissa Masden
D. Sinha
FedML
40
3
0
24 Jul 2020
On stochastic mirror descent with interacting particles: convergence
  properties and variance reduction
On stochastic mirror descent with interacting particles: convergence properties and variance reduction
Anastasia Borovykh
N. Kantas
P. Parpas
G. Pavliotis
55
12
0
15 Jul 2020
Analyzing and Mitigating Data Stalls in DNN Training
Analyzing and Mitigating Data Stalls in DNN Training
Jayashree Mohan
Amar Phanishayee
Ashish Raniwala
Vijay Chidambaram
86
110
0
14 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing
  Communication in Distributed Learning
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng Jiang
G. Agrawal
54
5
0
13 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
90
37
0
09 Jul 2020
Coded Distributed Computing with Partial Recovery
Coded Distributed Computing with Partial Recovery
Emre Ozfatura
S. Ulukus
Deniz Gunduz
71
29
0
04 Jul 2020
Variance reduction for Riemannian non-convex optimization with batch
  size adaptation
Variance reduction for Riemannian non-convex optimization with batch size adaptation
Andi Han
Junbin Gao
85
5
0
03 Jul 2020
Gradient-only line searches to automatically determine learning rates
  for a variety of stochastic training algorithms
Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms
D. Kafka
D. Wilke
ODL
43
0
0
29 Jun 2020
Is SGD a Bayesian sampler? Well, almost
Is SGD a Bayesian sampler? Well, almost
Chris Mingard
Guillermo Valle Pérez
Joar Skalse
A. Louis
BDL
83
53
0
26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
62
100
0
26 Jun 2020
Effective Elastic Scaling of Deep Learning Workloads
Effective Elastic Scaling of Deep Learning Workloads
Vaibhav Saxena
K.R. Jayaram
Saurav Basu
Yogish Sabharwal
Ashish Verma
57
9
0
24 Jun 2020
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage
  Trees
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees
Ahnjae Shin
Do Yoon Kim
Joo Seong Jeong
Byung-Gon Chun
52
4
0
22 Jun 2020
How do SGD hyperparameters in natural training affect adversarial
  robustness?
How do SGD hyperparameters in natural training affect adversarial robustness?
Sandesh Kamath
Amit Deshpande
K. Subrahmanyam
AAML
44
3
0
20 Jun 2020
An Online Method for A Class of Distributionally Robust Optimization
  with Non-Convex Objectives
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives
Qi Qi
Zhishuai Guo
Yi Tian Xu
Rong Jin
Tianbao Yang
117
47
0
17 Jun 2020
Fine-Grained Stochastic Architecture Search
Fine-Grained Stochastic Architecture Search
S. Chaudhuri
Elad Eban
Hanhan Li
Max Moroz
Yair Movshovitz-Attias
40
8
0
17 Jun 2020
Gradient Amplification: An efficient way to train deep neural networks
Gradient Amplification: An efficient way to train deep neural networks
S. Basodi
Chunyan Ji
Haiping Zhang
Yi Pan
ODL
55
116
0
16 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory
  Approach to Neural Network Training
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
148
50
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
219
95
0
15 Jun 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
81
228
0
12 Jun 2020
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Michael T. McCann
S. Ravishankar
47
8
0
09 Jun 2020
Learning Rate Annealing Can Provably Help Generalization, Even for
  Convex Problems
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
64
21
0
15 May 2020
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed
  Training
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training
Yemao Xu
Dezun Dong
Weixia Xu
Xiangke Liao
47
7
0
14 May 2020
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient
  Direction Change
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
Hongfei Xu
Josef van Genabith
Deyi Xiong
Qiuhui Liu
47
11
0
05 May 2020
Adaptive Learning of the Optimal Batch Size of SGD
Adaptive Learning of the Optimal Batch Size of SGD
Motasem Alfarra
Slavomir Hanzely
Alyazeed Albasyoni
Guohao Li
Peter Richtárik
48
5
0
03 May 2020
Dynamic backup workers for parallel machine learning
Dynamic backup workers for parallel machine learning
Chuan Xu
Giovanni Neglia
Nicola Sebastianelli
72
11
0
30 Apr 2020
DIET: Lightweight Language Understanding for Dialogue Systems
DIET: Lightweight Language Understanding for Dialogue Systems
Tanja Bunk
Daksh Varshneya
Vladimir Vlasov
Alan Nichol
74
162
0
21 Apr 2020
On Learning Rates and Schrödinger Operators
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
97
61
0
15 Apr 2020
Stochastic batch size for adaptive regularization in deep network
  optimization
Stochastic batch size for adaptive regularization in deep network optimization
Kensuke Nakamura
Stefano Soatto
Byung-Woo Hong
ODL
51
6
0
14 Apr 2020
Understanding Learning Dynamics for Neural Machine Translation
Understanding Learning Dynamics for Neural Machine Translation
Conghui Zhu
Guanlin Li
Lemao Liu
Tiejun Zhao
Shuming Shi
50
3
0
05 Apr 2020
Predicting the outputs of finite deep neural networks trained with noisy
  gradients
Predicting the outputs of finite deep neural networks trained with noisy gradients
Gadi Naveh
Oded Ben-David
H. Sompolinsky
Zohar Ringel
116
23
0
02 Apr 2020
Understanding the Effects of Data Parallelism and Sparsity on Neural
  Network Training
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
Martin Jaggi
52
2
0
25 Mar 2020
The Implicit Regularization of Stochastic Gradient Flow for Least
  Squares
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali
Yan Sun
Robert Tibshirani
103
77
0
17 Mar 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
128
12
0
06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
237
241
0
04 Mar 2020
Stagewise Enlargement of Batch Size for SGD-based Learning
Stagewise Enlargement of Batch Size for SGD-based Learning
Shen-Yi Zhao
Yin-Peng Xie
Wu-Jun Li
50
1
0
26 Feb 2020
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in
  the Presence of Stragglers
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers
Serge Kas Hanna
Rawad Bitar
Parimal Parag
Venkateswara Dasari
S. E. Rouayheb
69
16
0
25 Feb 2020
Baryon acoustic oscillations reconstruction using convolutional neural
  networks
Baryon acoustic oscillations reconstruction using convolutional neural networks
Tianxiang Mao
Jie-Shuang Wang
Baojiu Li
Yan-Chuan Cai
B. Falck
M. Neyrinck
A. Szalay
61
13
0
24 Feb 2020
Previous
123...1056789
Next