v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017

Samuel L. Smith

Pieter-Jan Kindermans

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown

Title
Investigating the interaction between gradient-only line searches and different activation functions D. Kafka D. Wilke 48 0 0 23 Feb 2020
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning Karsten Roth Timo Milbich Samarth Sinha Prateek Gupta Bjorn Ommer Joseph Paul Cohen 187 173 0 19 Feb 2020
Rethinking the Hyperparameters for Fine-tuning Hao Li Pratik Chaudhari Hao Yang Michael Lam Avinash Ravichandran Rahul Bhotika Stefano Soatto VLM 93 130 0 19 Feb 2020
Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function M. Kawanaka Yuma Koizumi Ryoichi Miyazaki Kohei Yatabe AAML 70 23 0 14 Feb 2020
Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise Umut Simsekli Lingjiong Zhu Yee Whye Teh Mert Gurbuzbalaban 92 50 0 13 Feb 2020
Scalable and Practical Natural Gradient for Large-Scale Deep Learning Kazuki Osawa Yohei Tsuji Yuichiro Ueno Akira Naruse Chuan-Sheng Foo Rio Yokota 90 37 0 13 Feb 2020
Black-Box Optimization with Local Generative Surrogates S. Shirobokov V. Belavin Michael Kagan Andrey Ustyuzhanin A. G. Baydin 60 3 0 11 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima Zeke Xie Issei Sato Masashi Sugiyama ODL 127 17 0 10 Feb 2020
Depthwise-STFT based separable Convolutional Neural Networks Sudhakar Kumawat Shanmuganathan Raman OOD MDE 50 5 0 27 Jan 2020
Variance Reduction with Sparse Gradients Melih Elibol Lihua Lei Michael I. Jordan 67 23 0 27 Jan 2020
Data-Driven Permanent Magnet Temperature Estimation in Synchronous Motors with Supervised Machine Learning Wilhelm Kirchgässner Oliver Wallscheid J. Böcker 41 70 0 17 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well Vipul Gupta S. Serrano D. DeCoste MoMe 88 60 0 07 Jan 2020
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity Konpat Preechakul B. Kijsirikul ODL 40 3 0 24 Dec 2019
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 137 169 0 19 Dec 2019
On the Bias-Variance Tradeoff: Textbooks Need an Update Brady Neal 43 18 0 17 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis Jonathan Frankle Gintare Karolina Dziugaite Daniel M. Roy Michael Carbin MoMe 201 630 0 11 Dec 2019
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers T. Nguyen Animesh Garg Richard G. Baraniuk Anima Anandkumar TPM 113 9 0 09 Dec 2019
Observational Overfitting in Reinforcement Learning Xingyou Song Yiding Jiang Stephen Tu Yilun Du Behnam Neyshabur OffRL 134 140 0 06 Dec 2019
Neural Machine Translation: A Review and Survey Felix Stahlberg 3DV AI4TS MedIm 142 332 0 04 Dec 2019
A Multigrid Method for Efficiently Training Video Models Chaoxia Wu Ross B. Girshick Kaiming He Christoph Feichtenhofer Philipp Krahenbuhl 95 94 0 02 Dec 2019
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks Umut Simsekli Mert Gurbuzbalaban T. H. Nguyen G. Richard Levent Sagun 88 59 0 29 Nov 2019
Stage-based Hyper-parameter Optimization for Deep Learning Ahnjae Shin Dongjin Shin Sungwoo Cho Do Yoon Kim Eunji Jeong Gyeong-In Yu Byung-Gon Chun 31 4 0 24 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling Jack W. Rae Anna Potapenko Siddhant M. Jayakumar Timothy Lillicrap RALM VLM KELM 110 656 0 13 Nov 2019
Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels Yihan Jiang Hyeji Kim Himanshu Asnani Sreeram Kannan Sewoong Oh Pramod Viswanath 69 138 0 08 Nov 2019
Small-GAN: Speeding Up GAN Training Using Core-sets Samarth Sinha Hang Zhang Anirudh Goyal Yoshua Bengio Hugo Larochelle Augustus Odena GAN 99 77 0 29 Oct 2019
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs Koyel Mukherjee Alind Khare Ashish Verma 76 15 0 25 Oct 2019
Fast Exact Matrix Completion: A Unified Optimization Framework for Matrix Completion Dimitris Bertsimas M. Li 67 2 0 21 Oct 2019
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic Matteo Sordello Niccolò Dalmasso Hangfeng He Weijie Su 68 7 0 18 Oct 2019
Improving the convergence of SGD through adaptive batch sizes Scott Sievert Zachary B. Charles ODL 74 8 0 18 Oct 2019
Demon: Improved Neural Network Training with Momentum Decay John Chen Cameron R. Wolfe Zhaoqi Li Anastasios Kyrillidis ODL 106 15 0 11 Oct 2019
Blink: Fast and Generic Collectives for Distributed ML Guanhua Wang Shivaram Venkataraman Amar Phanishayee J. Thelin Nikhil R. Devanur Ion Stoica VLM 67 142 0 11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization Jerry Ma Denis Yarats 106 70 0 09 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet Training John Shelton Hyatt Cameron R. Wolfe Michael Lee Yuxin Tang Anastasios Kyrillidis Christopher M. Jermaine OOD 92 39 0 04 Oct 2019
SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low Overhead A. Masullo Ligang He Toby Perrett Rui Mao Carsten Maple Majid Mirmehdi 111 319 0 03 Oct 2019
Stochastic gradient descent for hybrid quantum-classical optimization R. Sweke Frederik Wilde Johannes Jakob Meyer Maria Schuld Paul K. Fährmann Barthélémy Meynard-Piganeau Jens Eisert 105 241 0 02 Oct 2019
Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos Ji Lin Chuang Gan Song Han 78 10 0 01 Oct 2019
Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs Yuxian Meng Xiangyuan Ren Zijun Sun Xiaoya Li Arianna Yuan Leilei Gan Jiwei Li AIMat AI4CE 62 8 0 26 Sep 2019
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle Michael Kaufmann K. Kourtis Celestine Mendler-Dünner Adrian Schüpbach Thomas Parnell 18 0 0 11 Sep 2019
Neural Architecture Search in Embedding Space Chunmiao Liu 64 0 0 09 Sep 2019
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency Elad Hoffer Berry Weinstein Itay Hubara Tal Ben-Nun Torsten Hoefler Daniel Soudry 115 20 0 12 Aug 2019
EdgeNet: Semantic Scene Completion from a Single RGB-D Image Aloisio Dourado Teofilo de Campos Hansung Kim A. Hilton 3DV 3DPC 67 18 0 08 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training Saptadeep Pal Eiman Ebrahimi A. Zulfiqar Yaosheng Fu Victor Zhang Szymon Migacz D. Nellans Puneet Gupta 92 59 0 30 Jul 2019
Adaptive Regularization via Residual Smoothing in Deep Learning Optimization Jung-Kyun Cho Junseok Kwon Byung-Woo Hong 71 1 0 23 Jul 2019
Adaptive Weight Decay for Deep Neural Networks Kensuke Nakamura Byung-Woo Hong 63 43 0 21 Jul 2019
The University of Edinburgh's Submissions to the WMT19 News Translation Task Rachel Bawden Nikolay Bogoychev Ulrich Germann Roman Grundkiewicz Faheem Kirefu Antonio Valerio Miceli Barone Alexandra Birch 59 32 0 12 Jul 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks Yuanzhi Li Colin Wei Tengyu Ma 93 300 0 10 Jul 2019
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model Guodong Zhang Lala Li Zachary Nado James Martens Sushant Sachdeva George E. Dahl Christopher J. Shallue Roger C. Grosse 128 154 0 09 Jul 2019
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale A. G. Baydin Lei Shao W. Bhimji Lukas Heinrich Lawrence Meadows ... Philip Torr Victor W. Lee Kyle Cranmer P. Prabhat Frank Wood 82 58 0 08 Jul 2019
EPNAS: Efficient Progressive Neural Architecture Search Yanqi Zhou Peng Wang Sercan O. Arik Haonan Yu Syed Zawad Feng Yan G. Diamos 47 5 0 07 Jul 2019
The Adversarial Robustness of Sampling Omri Ben-Eliezer E. Yogev TTA AAML 63 48 0 26 Jun 2019