Shape Matters: Understanding the Implicit Bias of the Noise Covariance

15 June 2020

Papers citing "Shape Matters: Understanding the Implicit Bias of the Noise Covariance"

50 / 68 papers shown

Title
Simplicity Bias via Global Convergence of Sharpness Minimization Khashayar Gatmiry Zhiyuan Li Sashank J. Reddi Stefanie Jegelka 26 1 0 21 Oct 2024
The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization Haihan Zhang Yuanshi Liu Qianwen Chen Cong Fang 36 0 0 15 Sep 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition Mohamad Amin Mohamadi Zhiyuan Li Lei Wu Danica J. Sutherland 42 10 0 17 Jul 2024
Stochastic Differential Equations models for Least-Squares Stochastic Gradient Descent Adrien Schertzer Loucas Pillaud-Vivien 26 0 0 02 Jul 2024
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution Naoki Yoshida Shogo H. Nakakita Masaaki Imaizumi 26 1 0 23 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD Pierfrancesco Beneventano Andrea Pinto Tomaso A. Poggio MLT 27 1 0 17 Jun 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention Heejune Sheen Siyu Chen Tianhao Wang Harrison H. Zhou MLT 35 10 0 13 Mar 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks Hristo Papazov Scott Pesme Nicolas Flammarion 38 5 0 08 Mar 2024
The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing Yang Xu Yihong Gu Cong Fang 43 0 0 03 Mar 2024
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training Tom Sander Maxime Sylvestre Alain Durmus 31 1 0 13 Feb 2024
Understanding the Generalization Benefits of Late Learning Rate Decay Yinuo Ren Chao Ma Lexing Ying AI4CE 29 6 0 21 Jan 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking Kaifeng Lyu Jikai Jin Zhiyuan Li Simon S. Du Jason D. Lee Wei Hu AI4CE 41 32 0 30 Nov 2023
Generalization Bounds for Label Noise Stochastic Gradient Descent Jung Eun Huh Patrick Rebeschini 13 1 0 01 Nov 2023
How connectivity structure shapes rich and lazy learning in neural circuits Yuhan Helena Liu A. Baratin Jonathan H. Cornford Stefan Mihalas E. Shea-Brown Guillaume Lajoie 38 14 0 12 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent Mingze Wang Lei Wu 22 3 0 01 Oct 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time Yeqi Gao Zhao-quan Song Weixin Wang Junze Yin 20 25 0 14 Sep 2023
Transformers as Support Vector Machines Davoud Ataee Tarzanagh Yingcong Li Christos Thrampoulidis Samet Oymak 48 43 0 31 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning Nikhil Ghosh Spencer Frei Wooseok Ha Ting Yu MLT 32 3 0 06 Aug 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization Kaiyue Wen Zhiyuan Li Tengyu Ma FAtt 36 26 0 20 Jul 2023
Max-Margin Token Selection in Attention Mechanism Davoud Ataee Tarzanagh Yingcong Li Xuechen Zhang Samet Oymak 34 38 0 23 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning Nikhil Vyas Depen Morwani Rosie Zhao Gal Kaplun Sham Kakade Boaz Barak MLT 13 4 0 14 Jun 2023
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks F. Chen D. Kunin Atsushi Yamamura Surya Ganguli 23 26 0 07 Jun 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks Scott Pesme Nicolas Flammarion 29 35 0 02 Apr 2023
Revisiting the Noise Model of Stochastic Gradient Descent Barak Battash Ofir Lindenbaum 27 9 0 05 Mar 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta S. Keerthi Ayan Acharya Borja Ocejo Gregory Dexter Rajiv Khanna D. Durfee Rahul Mazumder AAML 15 7 0 19 Feb 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability Mathieu Even Scott Pesme Suriya Gunasekar Nicolas Flammarion 26 16 0 17 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning Antonio Sclocchi Mario Geiger M. Wyart 32 6 0 31 Jan 2023
Deep networks for system identification: a Survey G. Pillonetto Aleksandr Aravkin Daniel Gedon L. Ljung Antônio H. Ribeiro Thomas B. Schon OOD 35 35 0 30 Jan 2023
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta D. Durfee Ayan Acharya S. Keerthi Rahul Mazumder AAML 23 5 0 07 Dec 2022
On the Overlooked Structure of Stochastic Gradients Zeke Xie Qian-Yuan Tang Mingming Sun P. Li 23 6 0 05 Dec 2022
Flatter, faster: scaling momentum for optimal speedup of SGD Aditya Cowsik T. Can Paolo Glorioso 54 5 0 28 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models Hong Liu Sang Michael Xie Zhiyuan Li Tengyu Ma AI4CE 34 49 0 25 Oct 2022
Noise Injection as a Probe of Deep Learning Dynamics Noam Levi I. Bloch M. Freytsis T. Volansky 37 2 0 24 Oct 2022
SGD with Large Step Sizes Learns Sparse Features Maksym Andriushchenko Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 45 56 0 11 Oct 2022
Incremental Learning in Diagonal Linear Networks Raphael Berthier CLL AI4CE 27 16 0 31 Aug 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 34 72 0 26 Aug 2022
Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution Jianhao Ma S. Fattahi 42 5 0 15 Jul 2022
Towards understanding how momentum improves generalization in deep learning Samy Jelassi Yuanzhi Li ODL MLT AI4CE 13 30 0 13 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent Zhiyuan Li Tianhao Wang Jason D. Lee Sanjeev Arora 34 27 0 08 Jul 2022
The alignment property of SGD noise and how it helps select flat minima: A stability analysis Lei Wu Mingze Wang Weijie Su MLT 22 31 0 06 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation Loucas Pillaud-Vivien J. Reygner Nicolas Flammarion NoLa 33 31 0 20 Jun 2022
Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence Margalit Glasgow Colin Wei Mary Wootters Tengyu Ma 38 5 0 16 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 37 69 0 14 Jun 2022
Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably) Yu Huang Junyang Lin Chang Zhou Hongxia Yang Longbo Huang 11 88 0 23 Mar 2022
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization I Zaghloul Amir Roi Livni Nathan Srebro 22 6 0 27 Feb 2022
On Optimal Early Stopping: Over-informative versus Under-informative Parametrization Ruoqi Shen Liyao (Mars) Gao Yi-An Ma 9 13 0 20 Feb 2022
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably Tianyi Liu Yan Li Enlu Zhou Tuo Zhao 38 1 0 07 Feb 2022
Anticorrelated Noise Injection for Improved Generalization Antonio Orvieto Hans Kersting F. Proske Francis R. Bach Aurélien Lucchi 53 44 0 06 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks Noam Razin Asaf Maman Nadav Cohen 37 29 0 27 Jan 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 88 98 0 13 Oct 2021