Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 156 papers shown

Title
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis Dominik Rivoir Isabel Funke Stefanie Speidel 24 17 0 15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer Greg Yang J. E. Hu Igor Babuschkin Szymon Sidor Xiaodong Liu David Farhi Nick Ryder J. Pachocki Weizhu Chen Jianfeng Gao 26 148 0 07 Mar 2022
Regularising for invariance to data augmentation improves supervised learning Aleksander Botev Matthias Bauer Soham De 32 14 0 07 Mar 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers Youjie Li Amar Phanishayee D. Murray Jakub Tarnawski N. Kim 19 19 0 02 Feb 2022
Memory-Efficient Backpropagation through Large Linear Layers Daniel Bershatsky A. Mikhalev A. Katrutsa Julia Gusak D. Merkulov Ivan Oseledets 19 4 0 31 Jan 2022
On the Power-Law Hessian Spectrums in Deep Learning Zeke Xie Qian-Yuan Tang Yunfeng Cai Mingming Sun P. Li ODL 42 9 0 31 Jan 2022
Toward Training at ImageNet Scale with Differential Privacy Alexey Kurakin Shuang Song Steve Chien Roxana Geambasu Andreas Terzis Abhradeep Thakurta 36 100 0 28 Jan 2022
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient Descent Riddhiman Bhattacharya Tiefeng Jiang 16 0 0 14 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective Xiaowu Dai Yuhua Zhu 27 4 0 02 Dec 2021
Hybrid BYOL-ViT: Efficient approach to deal with small datasets Safwen Naimi Rien van Leeuwen W. Souidène S. B. Saoud SSL ViT 25 2 0 08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime Hikaru Ibayashi Masaaki Imaizumi 34 4 0 07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey Xiaoxin He Fuzhao Xue Xiaozhe Ren Yang You 30 14 0 01 Nov 2021
Trade-offs of Local SGD at Scale: An Empirical Study Jose Javier Gonzalez Ortiz Jonathan Frankle Michael G. Rabbat Ari S. Morcos Nicolas Ballas FedML 43 19 0 15 Oct 2021
Spectral Bias in Practice: The Role of Function Frequency in Generalization Sara Fridovich-Keil Raphael Gontijo-Lopes Rebecca Roelofs 41 28 0 06 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data Zhiyuan Zhang Lingjuan Lyu Weiqiang Wang Lichao Sun Xu Sun 21 35 0 03 Sep 2021
Shift-Curvature, SGD, and Generalization Arwen V. Bradley C. Gomez-Uribe Manish Reddy Vuyyuru 35 2 0 21 Aug 2021
Logit Attenuating Weight Normalization Aman Gupta R. Ramanath Jun Shi Anika Ramachandran Sirou Zhou Mingzhou Zhou S. Keerthi 40 1 0 12 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters Chen Sun Shenggui Li Jinyue Wang Jun Yu 54 47 0 08 Aug 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 31 15 0 19 Jul 2021
Bag of Tricks for Neural Architecture Search T. Elsken B. Staffler Arber Zela J. H. Metzen Frank Hutter 27 5 0 08 Jul 2021
On Large-Cohort Training for Federated Learning Zachary B. Charles Zachary Garrett Zhouyuan Huo Sergei Shmulyian Virginia Smith FedML 21 113 0 15 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training Yong Liu Xiangning Chen Minhao Cheng Cho-Jui Hsieh Yang You ODL 28 13 0 01 Jun 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization Tianlong Chen Zhenyu (Allen) Zhang Xu Ouyang Zechun Liu Zhiqiang Shen Zhangyang Wang MQ 43 36 0 16 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization Zeke Xie Li-xin Yuan Zhanxing Zhu Masashi Sugiyama 27 29 0 31 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 44 78 0 24 Feb 2021
Low Curvature Activations Reduce Overfitting in Adversarial Training Vasu Singla Sahil Singla David Jacobs S. Feizi AAML 32 45 0 15 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers Guojun Xiong Gang Yan Rahul Singh Jian Li 28 12 0 11 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization Andrew Brock Soham De Samuel L. Smith Karen Simonyan VLM 223 512 0 11 Feb 2021
A spin-glass model for the loss surfaces of generative adversarial networks Nicholas P. Baskerville J. Keating F. Mezzadri J. Najnudel GAN 30 12 0 07 Jan 2021
Learning from History for Byzantine Robust Optimization Sai Praneeth Karimireddy Lie He Martin Jaggi FedML AAML 30 173 0 18 Dec 2020
Data optimization for large batch distributed training of deep neural networks Shubhankar Gahlot Junqi Yin Mallikarjun Shankar 16 1 0 16 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent Kangqiao Liu Liu Ziyin Masakuni Ueda MLT 61 37 0 07 Dec 2020
EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation using Accelerated Neuroevolution with Weight Transfer William J. McNally Kanav Vats Alexander Wong J. McPhee 3DH 30 16 0 17 Nov 2020
Regularizing Neural Networks via Adversarial Model Perturbation Yaowei Zheng Richong Zhang Yongyi Mao AAML 30 95 0 10 Oct 2020
Improved generalization by noise enhancement Takashi Mori Masahito Ueda 24 3 0 28 Sep 2020
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties Benjamin Kurt Miller Mario Geiger Tess E. Smidt Frank Noé 16 75 0 19 Aug 2020
BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition Y. Kim Wonpyo Park Jongju Shin CVBM 27 51 0 15 Aug 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training Shen-Yi Zhao Chang-Wei Shi Yin-Peng Xie Wu-Jun Li ODL 26 8 0 28 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training Tyler B. Johnson Pulkit Agrawal Haijie Gu Carlos Guestrin ODL 27 37 0 09 Jul 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training Diego Granziol S. Zohren Stephen J. Roberts ODL 37 49 0 16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance Jeff Z. HaoChen Colin Wei J. Lee Tengyu Ma 29 93 0 15 Jun 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems Preetum Nakkiran MLT 31 21 0 15 May 2020
Predicting the outputs of finite deep neural networks trained with noisy gradients Gadi Naveh Oded Ben-David H. Sompolinsky Zohar Ringel 19 20 0 02 Apr 2020
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks Majed El Helou Frederike Dumbgen Sabine Süsstrunk CLL AI4CE 30 2 0 07 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Soham De Samuel L. Smith ODL 19 20 0 24 Feb 2020
The Two Regimes of Deep Network Training Guillaume Leclerc A. Madry 19 45 0 24 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima Zeke Xie Issei Sato Masashi Sugiyama ODL 28 17 0 10 Feb 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well Vipul Gupta S. Serrano D. DeCoste MoMe 38 55 0 07 Jan 2020
Information-Theoretic Local Minima Characterization and Regularization Zhiwei Jia Hao Su 27 19 0 19 Nov 2019