On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 514 papers shown

Title
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training Shen-Yi Zhao Chang-Wei Shi Yin-Peng Xie Wu-Jun Li ODL 18 8 0 28 Jul 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism Yosuke Oyama N. Maruyama Nikoli Dryden Erin McCarthy P. Harrington J. Balewski Satoshi Matsuoka Peter Nugent B. Van Essen 3DV AI4CE 32 37 0 25 Jul 2020
Linear discriminant initialization for feed-forward neural networks Marissa Masden D. Sinha FedML 29 3 0 24 Jul 2020
Explicit Regularisation in Gaussian Noise Injections A. Camuto M. Willetts Umut Simsekli Stephen J. Roberts Chris Holmes 23 55 0 14 Jul 2020
Beyond Graph Neural Networks with Lifted Relational Neural Networks Gustav Sourek F. Železný Ondrej Kuzelka NAI 41 17 0 13 Jul 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training Weiyan Wang Cengguang Zhang Liu Yang Kai Chen Kun Tan 29 12 0 07 Jul 2020
When Does Preconditioning Help or Hurt Generalization? S. Amari Jimmy Ba Roger C. Grosse Xuechen Li Atsushi Nitanda Taiji Suzuki Denny Wu Ji Xu 36 32 0 18 Jun 2020
What Do Neural Networks Learn When Trained With Random Labels? Hartmut Maennel Ibrahim M. Alabdulmohsin Ilya O. Tolstikhin R. Baldock Olivier Bousquet Sylvain Gelly Daniel Keysers FedML 43 87 0 18 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training Diego Granziol S. Zohren Stephen J. Roberts ODL 37 48 0 16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance Jeff Z. HaoChen Colin Wei J. Lee Tengyu Ma 29 93 0 15 Jun 2020
On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them Chen Liu Mathieu Salzmann Tao R. Lin Ryota Tomioka Sabine Süsstrunk AAML 21 81 0 15 Jun 2020
Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption Xu Sun Zhiyuan Zhang Xuancheng Ren Ruixuan Luo Liangyou Li 22 39 0 10 Jun 2020
Speedy Performance Estimation for Neural Architecture Search Binxin Ru Clare Lyle Lisa Schut M. Fil Mark van der Wilk Y. Gal 18 36 0 08 Jun 2020
Automated Copper Alloy Grain Size Evaluation Using a Deep-learning CNN George S. Baggs P. Guerrier A. Loeb Jason C. Jones 21 9 0 20 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems Preetum Nakkiran MLT 25 21 0 15 May 2020
Pruning artificial neural networks: a way to find well-generalizing, high-entropy sharp minima Enzo Tartaglione Andrea Bragagnolo Marco Grangetto 26 11 0 30 Apr 2020
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training Sangkug Lym M. Erez 13 25 0 27 Apr 2020
Generative Data Augmentation for Commonsense Reasoning Yiben Yang Chaitanya Malaviya Jared Fernandez Swabha Swayamdipta Ronan Le Bras Ji-ping Wang Chandra Bhagavatula Yejin Choi Doug Downey LRM 22 91 0 24 Apr 2020
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks Majed El Helou Frederike Dumbgen Sabine Süsstrunk CLL AI4CE 30 2 0 07 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey Shuo Ouyang Dezun Dong Yemao Xu Liquan Xiao 30 12 0 06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 234 0 04 Mar 2020
Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization S. Chatterjee ODL OOD 11 48 0 25 Feb 2020
The Two Regimes of Deep Network Training Guillaume Leclerc A. Madry 13 45 0 24 Feb 2020
Communication-Efficient Edge AI: Algorithms and Systems Yuanming Shi Kai Yang Tao Jiang Jun Zhang Khaled B. Letaief GNN 17 326 0 22 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 50 154 0 21 Feb 2020
Bayesian Deep Learning and a Probabilistic Perspective of Generalization A. Wilson Pavel Izmailov UQCV BDL OOD 24 639 0 20 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima Zeke Xie Issei Sato Masashi Sugiyama ODL 20 17 0 10 Feb 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well Vipul Gupta S. Serrano D. DeCoste MoMe 38 55 0 07 Jan 2020
'Place-cell' emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space Moshir Harsh J. Tubiana Simona Cocco R. Monasson 9 14 0 30 Dec 2019
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 19 168 0 19 Dec 2019
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers T. Nguyen Animesh Garg Richard G. Baraniuk Anima Anandkumar TPM 28 9 0 09 Dec 2019
The Group Loss for Deep Metric Learning Ismail Elezi Sebastiano Vascon Alessandro Torcinovich Marcello Pelillo Laura Leal-Taixe 14 50 0 01 Dec 2019
Information-Theoretic Local Minima Characterization and Regularization Zhiwei Jia Hao Su 27 19 0 19 Nov 2019
Small-GAN: Speeding Up GAN Training Using Core-sets Samarth Sinha Hang Zhang Anirudh Goyal Yoshua Bengio Hugo Larochelle Augustus Odena GAN 35 72 0 29 Oct 2019
KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment Vlad Hosu Hanhe Lin T. Szirányi Dietmar Saupe 14 552 0 14 Oct 2019
Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin Colin Wei Tengyu Ma AAML OOD 36 85 0 09 Oct 2019
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets S. A. Jacobs B. Van Essen D. Hysom Jae-Seung Yeom Tim Moon ... J. Gaffney Tom Benson Peter B. Robinson L. Peterson B. Spears BDL AI4CE 22 17 0 05 Oct 2019
GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks Avraam Chatzimichailidis Franz-Josef Pfreundt N. Gauger J. Keuper 19 10 0 26 Sep 2019
Towards Understanding the Transferability of Deep Representations Hong Liu Mingsheng Long Jianmin Wang Michael I. Jordan 30 25 0 26 Sep 2019
A Closer Look at Domain Shift for Deep Learning in Histopathology Karin Stacke Gabriel Eilertsen Jonas Unger Claes Lundström OOD 10 63 0 25 Sep 2019
EEG-Based Driver Drowsiness Estimation Using Feature Weighted Episodic Training Yuqi Cui Yifan Xu Dongrui Wu 11 62 0 25 Sep 2019
Scale MLPerf-0.6 models on Google TPU-v3 Pods Sameer Kumar Victor Bitorff Dehao Chen Chi-Heng Chou Blake A. Hechtman ... Peter Mattson Shibo Wang Tao Wang Yuanzhong Xu Zongwei Zhou 8 39 0 21 Sep 2019
Understanding and Robustifying Differentiable Architecture Search Arber Zela T. Elsken Tonmoy Saikia Yassine Marrakchi Thomas Brox Frank Hutter OOD AAML 66 366 0 20 Sep 2019
Regularizing CNN Transfer Learning with Randomised Regression Yang Zhong A. Maki 16 13 0 16 Aug 2019
Visualizing and Understanding the Effectiveness of BERT Y. Hao Li Dong Furu Wei Ke Xu 22 181 0 15 Aug 2019
Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise Senwei Liang Zhongzhan Huang Mingfu Liang Haizhao Yang 27 57 0 12 Aug 2019
Progressive Transfer Learning Zhengxu Yu Long Wei Zhongming Jin Jianqiang Huang Deng Cai Xiansheng Hua VLM 24 10 0 07 Aug 2019
How Does Learning Rate Decay Help Modern Neural Networks? Kaichao You Mingsheng Long Jianmin Wang Michael I. Jordan 24 4 0 05 Aug 2019
On the Existence of Simpler Machine Learning Models Lesia Semenova Cynthia Rudin Ronald E. Parr 26 85 0 05 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training Saptadeep Pal Eiman Ebrahimi A. Zulfiqar Yaosheng Fu Victor Zhang Szymon Migacz D. Nellans Puneet Gupta 34 55 0 30 Jul 2019