On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 514 papers shown

Title
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient Descent Riddhiman Bhattacharya Tiefeng Jiang 16 0 0 14 Dec 2021
Image-to-Height Domain Translation for Synthetic Aperture Sonar Dylan Stewart Shawn F. Johnson Alina Zare 21 4 0 12 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective Xiaowu Dai Yuhua Zhu 25 4 0 02 Dec 2021
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks Yaoyu Zhang Yuqing Li Zhongwang Zhang Tao Luo Z. Xu 26 21 0 30 Nov 2021
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning Matías Mendieta Taojiannan Yang Pu Wang Minwoo Lee Zhengming Ding Cheng Chen FedML 24 158 0 28 Nov 2021
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping Xuran Meng Jianfeng Yao 22 7 0 26 Nov 2021
Sharpness-aware Quantization for Deep Neural Networks Jing Liu Jianfei Cai Bohan Zhuang MQ 27 24 0 24 Nov 2021
TransMorph: Transformer for unsupervised medical image registration Junyu Chen Eric C. Frey Yufan He W. Paul Segars Ye Li Yong Du ViT MedIm 36 302 0 19 Nov 2021
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits Hao Chen Lili Zheng Raed Al Kontar Garvesh Raskutti 20 3 0 19 Nov 2021
Papaya: Practical, Private, and Scalable Federated Learning Dzmitry Huba John Nguyen Kshitiz Malik Ruiyu Zhu Michael G. Rabbat ... H. Srinivas Kaikai Wang Anthony Shoumikhin Jesik Min Mani Malek FedML 110 137 0 08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime Hikaru Ibayashi Masaaki Imaizumi 28 4 0 07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey Xiaoxin He Fuzhao Xue Xiaozhe Ren Yang You 27 14 0 01 Nov 2021
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily Lun Du Xiaozhou Shi Qiang Fu Xiaojun Ma Hengyu Liu Shi Han Dongmei Zhang 40 104 0 29 Oct 2021
RoMA: Robust Model Adaptation for Offline Model-based Optimization Sihyun Yu Sungsoo Ahn Le Song Jinwoo Shin OffRL 27 31 0 27 Oct 2021
Stable Anderson Acceleration for Deep Learning Massimiliano Lupo Pasini Junqi Yin Viktor Reshniak M. Stoyanov 15 4 0 26 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization Dara Bahri H. Mobahi Yi Tay 127 98 0 16 Oct 2021
Trade-offs of Local SGD at Scale: An Empirical Study Jose Javier Gonzalez Ortiz Jonathan Frankle Michael G. Rabbat Ari S. Morcos Nicolas Ballas FedML 37 19 0 15 Oct 2021
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks R. Entezari Hanie Sedghi O. Saukh Behnam Neyshabur MoMe 37 216 0 12 Oct 2021
Not all noise is accounted equally: How differentially private learning benefits from large sampling rates Friedrich Dörmann Osvald Frisk L. Andersen Christian Fischer Pedersen FedML 59 25 0 12 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations Jiayao Zhang Hua Wang Weijie J. Su 32 7 0 11 Oct 2021
Observations on K-image Expansion of Image-Mixing Augmentation for Classification Joonhyun Jeong Sungmin Cha Jongwon Choi Sangdoo Yun Taesup Moon Y. Yoo VLM 21 6 0 08 Oct 2021
Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting Chengyu Dong Liyuan Liu Jingbo Shang NoLa AAML 56 18 0 07 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications Ziqiao Wang Yongyi Mao FedML MLT 37 22 0 07 Oct 2021
Spectral Bias in Practice: The Role of Function Frequency in Generalization Sara Fridovich-Keil Raphael Gontijo-Lopes Rebecca Roelofs 41 28 0 06 Oct 2021
Perturbated Gradients Updating within Unit Space for Deep Learning Ching-Hsun Tseng Liu Cheng Shin-Jye Lee Xiaojun Zeng 40 5 0 01 Oct 2021
Accelerating Encrypted Computing on Intel GPUs Yujia Zhai Mohannad Ibrahim Yiqin Qiu Fabian Boemer Zizhong Chen Alexey Titov Alexander Lyashevsky 26 26 0 29 Sep 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
Scalable deeper graph neural networks for high-performance materials property prediction Sadman Sadeed Omee Steph-Yves M. Louis Nihang Fu Lai Wei Sourin Dey Rongzhi Dong Qinyang Li Jianjun Hu 70 73 0 25 Sep 2021
Towards Generalized and Incremental Few-Shot Object Detection Yiting Li H. Zhu Jun Ma C. Teo Chen Xiang P. Vadakkepat T. Lee CLL ObjD 26 9 0 23 Sep 2021
DHA: End-to-End Joint Optimization of Data Augmentation Policy, Hyper-parameter and Architecture Kaichen Zhou Lanqing Hong Shuailiang Hu Fengwei Zhou Binxin Ru Jiashi Feng Zhenguo Li 56 10 0 13 Sep 2021
MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning T. Alkhalifah Hanchen Wang O. Ovcharenko OOD 47 65 0 11 Sep 2021
Adversarial Parameter Defense by Multi-Step Risk Minimization Zhiyuan Zhang Ruixuan Luo Xuancheng Ren Qi Su Liangyou Li Xu Sun AAML 25 6 0 07 Sep 2021
How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data Zhiyuan Zhang Lingjuan Lyu Weiqiang Wang Lichao Sun Xu Sun 21 35 0 03 Sep 2021
Shift-Curvature, SGD, and Generalization Arwen V. Bradley C. Gomez-Uribe Manish Reddy Vuyyuru 35 2 0 21 Aug 2021
Learning from Images: Proactive Caching with Parallel Convolutional Neural Networks Yantong Wang Ye Hu Zhaohui Yang Walid Saad Kai‐Kit Wong V. Friderikos 23 4 0 15 Aug 2021
Logit Attenuating Weight Normalization Aman Gupta R. Ramanath Jun Shi Anika Ramachandran Sirou Zhou Mingzhou Zhou S. Keerthi 37 1 0 12 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters Chen Sun Shenggui Li Jinyue Wang Jun Yu 54 47 0 08 Aug 2021
Batch Normalization Preconditioning for Neural Network Training Susanna Lange Kyle E. Helfrich Qiang Ye 27 9 0 02 Aug 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II Yossi Arjevani M. Field 28 18 0 21 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 31 15 0 19 Jul 2021
Rethinking Graph Auto-Encoder Models for Attributed Graph Clustering Nairouz Mrabah Mohamed Bouguessa M. Touati Riadh Ksantini 35 62 0 19 Jul 2021
Point-Cloud Deep Learning of Porous Media for Permeability Prediction Ali Kashefi T. Mukerji 3DPC AI4CE 17 34 0 18 Jul 2021
The Bayesian Learning Rule Mohammad Emtiyaz Khan Håvard Rue BDL 63 73 0 09 Jul 2021
Activated Gradients for Deep Neural Networks Mei Liu Liangming Chen Xiaohao Du Long Jin Mingsheng Shang ODL AI4CE 27 135 0 09 Jul 2021
What can linear interpolation of neural network loss landscapes tell us? Tiffany J. Vlaar Jonathan Frankle MoMe 27 27 0 30 Jun 2021
Implicit Gradient Alignment in Distributed and Federated Learning Yatin Dandi Luis Barba Martin Jaggi FedML 23 31 0 25 Jun 2021
Sparse Flows: Pruning Continuous-depth Models Lucas Liebenwein Ramin Hasani Alexander Amini Daniela Rus 26 16 0 24 Jun 2021
Minimum sharpness: Scale-invariant parameter-robustness of neural networks Hikaru Ibayashi Takuo Hamaguchi Masaaki Imaizumi 25 5 0 23 Jun 2021
Deep Learning Through the Lens of Example Difficulty R. Baldock Hartmut Maennel Behnam Neyshabur 47 156 0 17 Jun 2021
On Large-Cohort Training for Federated Learning Zachary B. Charles Zachary Garrett Zhouyuan Huo Sergei Shmulyian Virginia Smith FedML 21 113 0 15 Jun 2021