Extrapolation for Large-batch Training in Deep Learning

Extrapolation for Large-batch Training in Deep Learning

10 June 2020

Sebastian U. Stich

Papers citing "Extrapolation for Large-batch Training in Deep Learning"

14 / 14 papers shown

Title
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead Marlon Becker Frederick Altrock Benjamin Risse 79 5 0 22 Jan 2024
Faster Federated Learning with Decaying Number of Local SGD Steps Jed Mills Jia Hu Geyong Min FedML 30 7 0 16 May 2023
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations Inyoung Paik Jaesik Choi 18 0 0 23 Apr 2023
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes O. Oyedotun Konstantinos Papadopoulos Djamila Aouada AI4CE 32 11 0 21 Oct 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning Lin Zhang S. Shi Wei Wang Bo-wen Li 36 10 0 30 Jun 2022
Towards Understanding Sharpness-Aware Minimization Maksym Andriushchenko Nicolas Flammarion AAML 32 133 0 13 Jun 2022
Tackling benign nonconvexity with smoothing and stochastic gradients Harsh Vardhan Sebastian U. Stich 26 8 0 18 Feb 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape Devansh Bisla Jing Wang A. Choromańska 25 34 0 20 Jan 2022
Implicit Gradient Alignment in Distributed and Federated Learning Yatin Dandi Luis Barba Martin Jaggi FedML 26 31 0 25 Jun 2021
On Large-Cohort Training for Federated Learning Zachary B. Charles Zachary Garrett Zhouyuan Huo Sergei Shmulyian Virginia Smith FedML 21 113 0 15 Jun 2021
Consensus Control for Decentralized Deep Learning Lingjing Kong Tao R. Lin Anastasia Koloskova Martin Jaggi Sebastian U. Stich 19 75 0 09 Feb 2021
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training Shen-Yi Zhao Chang-Wei Shi Yin-Peng Xie Wu-Jun Li ODL 18 8 0 28 Jul 2020
Stochastic Nonconvex Optimization with Large Minibatches Weiran Wang Nathan Srebro 36 26 0 25 Sep 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 299 2,890 0 15 Sep 2016