Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.08680
Cited By
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
15 June 2020
Jeff Z. HaoChen
Colin Wei
J. Lee
Tengyu Ma
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Shape Matters: Understanding the Implicit Bias of the Noise Covariance"
50 / 68 papers shown
Title
Simplicity Bias via Global Convergence of Sharpness Minimization
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
24
1
0
21 Oct 2024
The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization
Haihan Zhang
Yuanshi Liu
Qianwen Chen
Cong Fang
36
0
0
15 Sep 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Mohamad Amin Mohamadi
Zhiyuan Li
Lei Wu
Danica J. Sutherland
42
10
0
17 Jul 2024
Stochastic Differential Equations models for Least-Squares Stochastic Gradient Descent
Adrien Schertzer
Loucas Pillaud-Vivien
26
0
0
02 Jul 2024
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution
Naoki Yoshida
Shogo H. Nakakita
Masaaki Imaizumi
24
1
0
23 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
27
1
0
17 Jun 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Heejune Sheen
Siyu Chen
Tianhao Wang
Harrison H. Zhou
MLT
33
10
0
13 Mar 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
Hristo Papazov
Scott Pesme
Nicolas Flammarion
38
5
0
08 Mar 2024
The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing
Yang Xu
Yihong Gu
Cong Fang
43
0
0
03 Mar 2024
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training
Tom Sander
Maxime Sylvestre
Alain Durmus
31
1
0
13 Feb 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
26
6
0
21 Jan 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
41
32
0
30 Nov 2023
Generalization Bounds for Label Noise Stochastic Gradient Descent
Jung Eun Huh
Patrick Rebeschini
13
1
0
01 Nov 2023
How connectivity structure shapes rich and lazy learning in neural circuits
Yuhan Helena Liu
A. Baratin
Jonathan H. Cornford
Stefan Mihalas
E. Shea-Brown
Guillaume Lajoie
38
14
0
12 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
22
3
0
01 Oct 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao-quan Song
Weixin Wang
Junze Yin
18
25
0
14 Sep 2023
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
48
43
0
31 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Ting Yu
MLT
30
3
0
06 Aug 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
36
26
0
20 Jul 2023
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
32
38
0
23 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
13
4
0
14 Jun 2023
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
F. Chen
D. Kunin
Atsushi Yamamura
Surya Ganguli
23
26
0
07 Jun 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Nicolas Flammarion
29
35
0
02 Apr 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
24
9
0
05 Mar 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
13
7
0
19 Feb 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability
Mathieu Even
Scott Pesme
Suriya Gunasekar
Nicolas Flammarion
26
16
0
17 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
M. Wyart
32
6
0
31 Jan 2023
Deep networks for system identification: a Survey
G. Pillonetto
Aleksandr Aravkin
Daniel Gedon
L. Ljung
Antônio H. Ribeiro
Thomas B. Schon
OOD
35
35
0
30 Jan 2023
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
D. Durfee
Ayan Acharya
S. Keerthi
Rahul Mazumder
AAML
23
5
0
07 Dec 2022
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
23
6
0
05 Dec 2022
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
52
5
0
28 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
32
49
0
25 Oct 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
32
2
0
24 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
45
56
0
11 Oct 2022
Incremental Learning in Diagonal Linear Networks
Raphael Berthier
CLL
AI4CE
25
16
0
31 Aug 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
30
72
0
26 Aug 2022
Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution
Jianhao Ma
S. Fattahi
40
5
0
15 Jul 2022
Towards understanding how momentum improves generalization in deep learning
Samy Jelassi
Yuanzhi Li
ODL
MLT
AI4CE
11
30
0
13 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
32
27
0
08 Jul 2022
The alignment property of SGD noise and how it helps select flat minima: A stability analysis
Lei Wu
Mingze Wang
Weijie Su
MLT
22
31
0
06 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
31
31
0
20 Jun 2022
Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence
Margalit Glasgow
Colin Wei
Mary Wootters
Tengyu Ma
38
5
0
16 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
37
69
0
14 Jun 2022
Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)
Yu Huang
Junyang Lin
Chang Zhou
Hongxia Yang
Longbo Huang
11
88
0
23 Mar 2022
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization
I Zaghloul Amir
Roi Livni
Nathan Srebro
22
6
0
27 Feb 2022
On Optimal Early Stopping: Over-informative versus Under-informative Parametrization
Ruoqi Shen
Liyao (Mars) Gao
Yi-An Ma
9
13
0
20 Feb 2022
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Tianyi Liu
Yan Li
Enlu Zhou
Tuo Zhao
38
1
0
07 Feb 2022
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurélien Lucchi
53
44
0
06 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
33
29
0
27 Jan 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
88
98
0
13 Oct 2021
1
2
Next