ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00489
  4. Cited By
Don't Decay the Learning Rate, Increase the Batch Size

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
    ODL
ArXivPDFHTML

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 170 papers shown
Title
A Group-Equivariant Autoencoder for Identifying Spontaneously Broken
  Symmetries
A Group-Equivariant Autoencoder for Identifying Spontaneously Broken Symmetries
Devanshu Agrawal
A. Del Maestro
Steven Johnston
James Ostrowski
DRL
AI4CE
36
2
0
13 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex
  optimization problems
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
21
7
0
09 Feb 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method
  with Probabilistic Gradient Estimation
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
Matilde Gargiani
Andrea Zanelli
Andrea Martinelli
Tyler H. Summers
John Lygeros
33
14
0
01 Feb 2022
Computationally Efficient Approximations for Matrix-based Renyi's
  Entropy
Computationally Efficient Approximations for Matrix-based Renyi's Entropy
Tieliang Gong
Yuxin Dong
Shujian Yu
B. Dong
67
2
0
27 Dec 2021
Automated Deep Learning: Neural Architecture Search Is Not the End
Automated Deep Learning: Neural Architecture Search Is Not the End
Xuanyi Dong
D. Kedziora
Katarzyna Musial
Bogdan Gabrys
25
26
0
16 Dec 2021
Hybrid BYOL-ViT: Efficient approach to deal with small datasets
Hybrid BYOL-ViT: Efficient approach to deal with small datasets
Safwen Naimi
Rien van Leeuwen
W. Souidène
S. B. Saoud
SSL
ViT
25
2
0
08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary
  regime
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
34
4
0
07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
30
14
0
01 Nov 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the
  Edge
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
35
4
0
29 Oct 2021
A Sequence to Sequence Model for Extracting Multiple Product Name
  Entities from Dialog
A Sequence to Sequence Model for Extracting Multiple Product Name Entities from Dialog
Praneeth Gubbala
Xuan Zhang
16
1
0
28 Oct 2021
NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of
  Convolutional Neural Network Architecture and Training Hyperparameters
NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters
Yoichi Hirose
Nozomu Yoshinari
Shinichi Shirakawa
25
13
0
19 Oct 2021
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
  Multi-GPU Servers
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers
Yujing Ma
Florin Rusu
Kesheng Wu
A. Sim
46
3
0
13 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic
  Differential Equations
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
35
7
0
11 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel
  Classification
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification
Gabriel Bénédict
Vincent Koops
Daan Odijk
Maarten de Rijke
35
30
0
24 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep
  Learning Workloads in GPU Clusters
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
54
47
0
08 Aug 2021
Large-Scale Differentially Private BERT
Large-Scale Differentially Private BERT
Rohan Anil
Badih Ghazi
Vineet Gupta
Ravi Kumar
Pasin Manurangsi
36
132
0
03 Aug 2021
BFTrainer: Low-Cost Training of Neural Networks on Unfillable
  Supercomputer Nodes
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Zhengchun Liu
R. Kettimuthu
M. Papka
Ian Foster
34
3
0
22 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of
  Tooling
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
Shuaiwen Leon Song
Sara Hooker
25
75
0
22 Jun 2021
Deep Learning Through the Lens of Example Difficulty
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
47
156
0
17 Jun 2021
On Large-Cohort Training for Federated Learning
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen
Kshitiz Malik
Hongyuan Zhan
Ashkan Yousefpour
Michael G. Rabbat
Mani Malek
Dzmitry Huba
FedML
33
289
0
11 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and
  efficient training of large language models
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
29
8
0
04 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
30
13
0
01 Jun 2021
Deep Neural Network as an alternative to Boosted Decision Trees for PID
Deep Neural Network as an alternative to Boosted Decision Trees for PID
Denis Stanev
Riccardo Riva
Michele Umassi
PINN
17
1
0
28 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
  Improve Generalization
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
27
29
0
31 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
31
46
0
28 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup
  Workers
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Guojun Xiong
Gang Yan
Rahul Singh
Jian Li
33
12
0
11 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
40
23
0
09 Feb 2021
Data optimization for large batch distributed training of deep neural
  networks
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
21
1
0
16 Dec 2020
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network
  Training
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training
Federico Zocco
Seán F. McLoone
ODL
23
4
0
14 Dec 2020
How to Train PointGoal Navigation Agents on a (Sample and Compute)
  Budget
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
Erik Wijmans
Irfan Essa
Dhruv Batra
3DPC
30
10
0
11 Dec 2020
Towards constraining warm dark matter with stellar streams through
  neural simulation-based inference
Towards constraining warm dark matter with stellar streams through neural simulation-based inference
Joeri Hermans
N. Banik
Christoph Weniger
G. Bertone
Gilles Louppe
30
29
0
30 Nov 2020
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet
Lorenzo Valerio
F. M. Nardini
A. Passarella
R. Perego
25
12
0
17 Nov 2020
Reverse engineering learned optimizers reveals known and novel
  mechanisms
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
22
21
0
04 Nov 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign
  Dropout
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
41
206
0
14 Oct 2020
Improved generalization by noise enhancement
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
24
3
0
28 Sep 2020
Relevance of Rotationally Equivariant Convolutions for Predicting
  Molecular Properties
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Benjamin Kurt Miller
Mario Geiger
Tess E. Smidt
Frank Noé
16
75
0
19 Aug 2020
A Survey on Large-scale Machine Learning
A Survey on Large-scale Machine Learning
Meng Wang
Weijie Fu
Xiangnan He
Shijie Hao
Xindong Wu
22
109
0
10 Aug 2020
Linear discriminant initialization for feed-forward neural networks
Linear discriminant initialization for feed-forward neural networks
Marissa Masden
D. Sinha
FedML
29
3
0
24 Jul 2020
On stochastic mirror descent with interacting particles: convergence
  properties and variance reduction
On stochastic mirror descent with interacting particles: convergence properties and variance reduction
Anastasia Borovykh
N. Kantas
P. Parpas
G. Pavliotis
28
12
0
15 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
27
37
0
09 Jul 2020
Coded Distributed Computing with Partial Recovery
Coded Distributed Computing with Partial Recovery
Emre Ozfatura
S. Ulukus
Deniz Gunduz
35
28
0
04 Jul 2020
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage
  Trees
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees
Ahnjae Shin
Do Yoon Kim
Joo Seong Jeong
Byung-Gon Chun
14
4
0
22 Jun 2020
An Online Method for A Class of Distributionally Robust Optimization
  with Non-Convex Objectives
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives
Qi Qi
Zhishuai Guo
Yi Tian Xu
Rong Jin
Tianbao Yang
33
44
0
17 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory
  Approach to Neural Network Training
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
37
49
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
J. Lee
Tengyu Ma
32
94
0
15 Jun 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
8
15
0
15 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
21
219
0
12 Jun 2020
Previous
1234
Next