Sharp Minima Can Generalize For Deep Nets

15 March 2017

Papers citing "Sharp Minima Can Generalize For Deep Nets"

50 / 165 papers shown

Title
Sufficient Invariant Learning for Distribution Shift Taero Kim Sungjun Lim Kyungwoo Song OOD 31 2 0 24 Oct 2022
Rethinking Sharpness-Aware Minimization as Variational Inference Szilvia Ujváry Zsigmond Telek A. Kerekes Anna Mészáros Ferenc Huszár 30 8 0 19 Oct 2022
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models Nikolaos Dimitriadis P. Frossard Franccois Fleuret 26 25 0 18 Oct 2022
SGD with Large Step Sizes Learns Sparse Features Maksym Andriushchenko Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 45 56 0 11 Oct 2022
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach Peng Mi Li Shen Tianhe Ren Yiyi Zhou Xiaoshuai Sun Rongrong Ji Dacheng Tao AAML 27 69 0 11 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima Peter L. Bartlett Philip M. Long Olivier Bousquet 73 34 0 04 Oct 2022
Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel Sungyub Kim Si-hun Park Kyungsu Kim Eunho Yang BDL 26 4 0 30 Sep 2022
Deep Double Descent via Smooth Interpolation Matteo Gamba Erik Englesson Marten Bjorkman Hossein Azizpour 63 10 0 21 Sep 2022
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning Christian Raymond Qi Chen Bing Xue Mengjie Zhang FedML 29 11 0 19 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 34 72 0 26 Aug 2022
A Deep Learning Approach for the solution of Probability Density Evolution of Stochastic Systems S. Pourtakdoust Amir H. Khodabakhsh 33 12 0 05 Jul 2022
On Leave-One-Out Conditional Mutual Information For Generalization Mohamad Rida Rammal Alessandro Achille Aditya Golatkar Suhas Diggavi Stefano Soatto VLM 28 5 0 01 Jul 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting Zhengqi He Zeke Xie Quanzhi Zhu Zengchang Qin 69 27 0 17 Jun 2022
Efficiently Training Low-Curvature Neural Networks Suraj Srinivas Kyle Matoba Himabindu Lakkaraju F. Fleuret AAML 23 15 0 14 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 37 69 0 14 Jun 2022
Towards Understanding Sharpness-Aware Minimization Maksym Andriushchenko Nicolas Flammarion AAML 26 133 0 13 Jun 2022
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion Chengli Tan Jiang Zhang Junmin Liu 35 1 0 09 Jun 2022
Linear Connectivity Reveals Generalization Strategies Jeevesh Juneja Rachit Bansal Kyunghyun Cho João Sedoc Naomi Saphra 237 45 0 24 May 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes Chao Ma D. Kunin Lei Wu Lexing Ying 25 27 0 24 Apr 2022
Small Batch Sizes Improve Training of Low-Resource Neural MT Àlex R. Atrio Andrei Popescu-Belis 27 6 0 20 Mar 2022
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization Xiuying Wei Ruihao Gong Yuhang Li Xianglong Liu F. Yu MQ VLM 19 166 0 11 Mar 2022
Adversarial robustness of sparse local Lipschitz predictors Ramchandran Muthukumar Jeremias Sulam AAML 32 13 0 26 Feb 2022
On PAC-Bayesian reconstruction guarantees for VAEs Badr-Eddine Chérief-Abdellatif Yuyang Shi Arnaud Doucet Benjamin Guedj DRL 45 17 0 23 Feb 2022
Tackling benign nonconvexity with smoothing and stochastic gradients Harsh Vardhan Sebastian U. Stich 23 8 0 18 Feb 2022
A Geometric Understanding of Natural Gradient Qinxun Bai S. Rosenberg Wei Xu 21 2 0 13 Feb 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning Yang Zhao Hao Zhang Xiuyuan Hu 30 116 0 08 Feb 2022
Anticorrelated Noise Injection for Improved Generalization Antonio Orvieto Hans Kersting F. Proske Francis R. Bach Aurelien Lucchi 53 44 0 06 Feb 2022
When Do Flat Minima Optimizers Work? Jean Kaddour Linqing Liu Ricardo M. A. Silva Matt J. Kusner ODL 18 58 0 01 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning Zeke Xie Qian-Yuan Tang Yunfeng Cai Mingming Sun P. Li ODL 42 8 0 31 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape Devansh Bisla Jing Wang A. Choromańska 25 34 0 20 Jan 2022
Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks Yang Zhao Hao Zhang 22 1 0 16 Jan 2022
Visualizing the Loss Landscape of Winning Lottery Tickets Robert Bain UQCV 25 3 0 16 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective Xiaowu Dai Yuhua Zhu 25 4 0 02 Dec 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime Hikaru Ibayashi Masaaki Imaizumi 26 4 0 07 Nov 2021
Using Graph-Theoretic Machine Learning to Predict Human Driver Behavior Rohan Chandra Aniket Bera Dinesh Manocha 48 22 0 04 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey Xiaoxin He Fuzhao Xue Xiaozhe Ren Yang You 24 14 0 01 Nov 2021
Hyper-Representations: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction Konstantin Schurholt Dimche Kostadinov Damian Borth SSL 19 14 0 28 Oct 2021
Does the Data Induce Capacity Control in Deep Learning? Rubing Yang J. Mao Pratik Chaudhari 25 15 0 27 Oct 2021
Towards Better Plasticity-Stability Trade-off in Incremental Learning: A Simple Linear Connector Guoliang Lin Hanlu Chu Hanjiang Lai MoMe CLL 29 43 0 15 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications Ziqiao Wang Yongyi Mao FedML MLT 37 22 0 07 Oct 2021
Perturbated Gradients Updating within Unit Space for Deep Learning Ching-Hsun Tseng Liu Cheng Shin-Jye Lee Xiaojun Zeng 40 5 0 01 Oct 2021
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization Alexandre Ramé Corentin Dancette Matthieu Cord OOD 38 204 0 07 Sep 2021
Shift-Curvature, SGD, and Generalization Arwen V. Bradley C. Gomez-Uribe Manish Reddy Vuyyuru 32 2 0 21 Aug 2021
Logit Attenuating Weight Normalization Aman Gupta R. Ramanath Jun Shi Anika Ramachandran Sirou Zhou Mingzhou Zhou S. Keerthi 34 1 0 12 Aug 2021
Batch Normalization Preconditioning for Neural Network Training Susanna Lange Kyle E. Helfrich Qiang Ye 27 9 0 02 Aug 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II Yossi Arjevani M. Field 28 18 0 21 Jul 2021
Implicit Gradient Alignment in Distributed and Federated Learning Yatin Dandi Luis Barba Martin Jaggi FedML 18 31 0 25 Jun 2021
Minimum sharpness: Scale-invariant parameter-robustness of neural networks Hikaru Ibayashi Takuo Hamaguchi Masaaki Imaizumi 25 5 0 23 Jun 2021
Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators David Stutz Nandhini Chandramoorthy Matthias Hein Bernt Schiele AAML MQ 22 18 0 16 Apr 2021
Relating Adversarially Robust Generalization to Flat Minima David Stutz Matthias Hein Bernt Schiele OOD 29 65 0 09 Apr 2021