On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
Minima

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Dheevatsa Mudigere

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

14 / 514 papers shown

Title
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia C. Wilson Rebecca Roelofs Mitchell Stern Nathan Srebro Benjamin Recht ODL 20 1,012 0 23 May 2017
Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions Kleomenis Katevas Ilias Leontiadis M. Pielot Joan Serrà HAI 11 12 0 17 May 2017
Nonlinear Information Bottleneck Artemy Kolchinsky Brendan D. Tracey David Wolpert 12 152 0 06 May 2017
Snapshot Ensembles: Train 1, get M for free Gao Huang Yixuan Li Geoff Pleiss Zhuang Liu J. Hopcroft Kilian Q. Weinberger OOD FedML UQCV 45 935 0 01 Apr 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data Gintare Karolina Dziugaite Daniel M. Roy 50 799 0 31 Mar 2017
Sharp Minima Can Generalize For Deep Nets Laurent Dinh Razvan Pascanu Samy Bengio Yoshua Bengio ODL 46 755 0 15 Mar 2017
Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks Nanyang Ye Zhanxing Zhu Rafał K. Mantiuk 16 20 0 13 Mar 2017
Data-Dependent Stability of Stochastic Gradient Descent Ilja Kuzborskij Christoph H. Lampert MLT 9 165 0 05 Mar 2017
Incorporating Global Visual Features into Attention-Based Neural Machine Translation Iacer Calixto Qun Liu Nick Campbell 24 154 0 23 Jan 2017
Incremental Sequence Learning E. Jong CLL 24 5 0 09 Nov 2016
Big Batch SGD: Automated Inference using Adaptive Batch Sizes Soham De A. Yadav David Jacobs Tom Goldstein ODL 14 62 0 18 Oct 2016
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability J. Keuper Franz-Josef Pfreundt GNN 55 97 0 22 Sep 2016
Parallelizing Word2Vec in Shared and Distributed Memory Shihao Ji N. Satish Sheng Li Pradeep Dubey VLM MoE 14 72 0 15 Apr 2016
The Loss Surfaces of Multilayer Networks A. Choromańska Mikael Henaff Michaël Mathieu Gerard Ben Arous Yann LeCun ODL 179 1,185 0 30 Nov 2014