Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00489
Cited By
Don't Decay the Learning Rate, Increase the Batch Size
1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Don't Decay the Learning Rate, Increase the Batch Size"
50 / 170 papers shown
Title
A Group-Equivariant Autoencoder for Identifying Spontaneously Broken Symmetries
Devanshu Agrawal
A. Del Maestro
Steven Johnston
James Ostrowski
DRL
AI4CE
36
2
0
13 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
21
7
0
09 Feb 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
Matilde Gargiani
Andrea Zanelli
Andrea Martinelli
Tyler H. Summers
John Lygeros
33
14
0
01 Feb 2022
Computationally Efficient Approximations for Matrix-based Renyi's Entropy
Tieliang Gong
Yuxin Dong
Shujian Yu
B. Dong
67
2
0
27 Dec 2021
Automated Deep Learning: Neural Architecture Search Is Not the End
Xuanyi Dong
D. Kedziora
Katarzyna Musial
Bogdan Gabrys
25
26
0
16 Dec 2021
Hybrid BYOL-ViT: Efficient approach to deal with small datasets
Safwen Naimi
Rien van Leeuwen
W. Souidène
S. B. Saoud
SSL
ViT
25
2
0
08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
34
4
0
07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
30
14
0
01 Nov 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
35
4
0
29 Oct 2021
A Sequence to Sequence Model for Extracting Multiple Product Name Entities from Dialog
Praneeth Gubbala
Xuan Zhang
16
1
0
28 Oct 2021
NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters
Yoichi Hirose
Nozomu Yoshinari
Shinichi Shirakawa
25
13
0
19 Oct 2021
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers
Yujing Ma
Florin Rusu
Kesheng Wu
A. Sim
46
3
0
13 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
35
7
0
11 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification
Gabriel Bénédict
Vincent Koops
Daan Odijk
Maarten de Rijke
35
30
0
24 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
54
47
0
08 Aug 2021
Large-Scale Differentially Private BERT
Rohan Anil
Badih Ghazi
Vineet Gupta
Ravi Kumar
Pasin Manurangsi
36
132
0
03 Aug 2021
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Zhengchun Liu
R. Kettimuthu
M. Papka
Ian Foster
34
3
0
22 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
Shuaiwen Leon Song
Sara Hooker
25
75
0
22 Jun 2021
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
47
156
0
17 Jun 2021
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen
Kshitiz Malik
Hongyuan Zhan
Ashkan Yousefpour
Michael G. Rabbat
Mani Malek
Dzmitry Huba
FedML
33
289
0
11 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
29
8
0
04 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
30
13
0
01 Jun 2021
Deep Neural Network as an alternative to Boosted Decision Trees for PID
Denis Stanev
Riccardo Riva
Michele Umassi
PINN
17
1
0
28 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
27
29
0
31 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
31
46
0
28 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Guojun Xiong
Gang Yan
Rahul Singh
Jian Li
33
12
0
11 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
40
23
0
09 Feb 2021
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
21
1
0
16 Dec 2020
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training
Federico Zocco
Seán F. McLoone
ODL
23
4
0
14 Dec 2020
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
Erik Wijmans
Irfan Essa
Dhruv Batra
3DPC
30
10
0
11 Dec 2020
Towards constraining warm dark matter with stellar streams through neural simulation-based inference
Joeri Hermans
N. Banik
Christoph Weniger
G. Bertone
Gilles Louppe
30
29
0
30 Nov 2020
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet
Lorenzo Valerio
F. M. Nardini
A. Passarella
R. Perego
25
12
0
17 Nov 2020
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
22
21
0
04 Nov 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
41
206
0
14 Oct 2020
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
24
3
0
28 Sep 2020
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Benjamin Kurt Miller
Mario Geiger
Tess E. Smidt
Frank Noé
16
75
0
19 Aug 2020
A Survey on Large-scale Machine Learning
Meng Wang
Weijie Fu
Xiangnan He
Shijie Hao
Xindong Wu
22
109
0
10 Aug 2020
Linear discriminant initialization for feed-forward neural networks
Marissa Masden
D. Sinha
FedML
29
3
0
24 Jul 2020
On stochastic mirror descent with interacting particles: convergence properties and variance reduction
Anastasia Borovykh
N. Kantas
P. Parpas
G. Pavliotis
28
12
0
15 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
27
37
0
09 Jul 2020
Coded Distributed Computing with Partial Recovery
Emre Ozfatura
S. Ulukus
Deniz Gunduz
35
28
0
04 Jul 2020
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees
Ahnjae Shin
Do Yoon Kim
Joo Seong Jeong
Byung-Gon Chun
14
4
0
22 Jun 2020
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives
Qi Qi
Zhishuai Guo
Yi Tian Xu
Rong Jin
Tianbao Yang
33
44
0
17 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
37
49
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
J. Lee
Tengyu Ma
32
94
0
15 Jun 2020
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
8
15
0
15 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
21
219
0
12 Jun 2020
Previous
1
2
3
4
Next