Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.05337
Cited By
v1
v2 (latest)
SGD with Large Step Sizes Learns Sparse Features
11 October 2022
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
Re-assign community
ArXiv (abs)
PDF
HTML
Github (32★)
Papers citing
"SGD with Large Step Sizes Learns Sparse Features"
50 / 52 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
73
2
0
05 Apr 2025
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
Chris Kolb
T. Weber
Bernd Bischl
David Rügamer
335
1
0
04 Feb 2025
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
49
0
0
04 Jul 2024
Fine-tuning with Very Large Dropout
Jianyu Zhang
Léon Bottou
91
2
0
01 Mar 2024
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
101
133
0
18 Jul 2022
On the Maximum Hessian Eigenvalue and Generalization
Simran Kaur
Jérémy E. Cohen
Zachary Chase Lipton
63
43
0
21 Jun 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
85
34
0
20 Jun 2022
Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions
Ning Yang
Chao Tang
Yuhai Tu
MLT
39
22
0
02 Jun 2022
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
Etienne Boursier
Loucas Pillaud-Vivien
Nicolas Flammarion
ODL
55
61
0
02 Jun 2022
On the Benefits of Large Learning Rates for Kernel Methods
Gaspard Beugnot
Julien Mairal
Alessandro Rudi
56
11
0
28 Feb 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
109
105
0
13 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
102
42
0
07 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
152
76
0
29 Sep 2021
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity
Arthur Jacot
François Ged
Berfin cSimcsek
Clément Hongler
Franck Gabriel
60
55
0
30 Jun 2021
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
Scott Pesme
Loucas Pillaud-Vivien
Nicolas Flammarion
58
108
0
17 Jun 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
114
120
0
11 Jun 2021
Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis
Stephan Wojtowytsch
72
34
0
04 Jun 2021
Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis
Stephan Wojtowytsch
63
51
0
04 May 2021
Acceleration via Fractal Learning Rate Schedules
Naman Agarwal
Surbhi Goel
Cyril Zhang
64
18
0
01 Mar 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
97
277
0
26 Feb 2021
Strength of Minibatch Noise in SGD
Liu Ziyin
Kangqiao Liu
Takashi Mori
Masakuni Ueda
ODL
MLT
39
35
0
10 Feb 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
314
725
0
31 Jan 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
47
204
0
28 Jan 2021
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate
Jingfeng Wu
Difan Zou
Vladimir Braverman
Quanquan Gu
79
18
0
04 Nov 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
100
75
0
06 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
199
1,358
0
03 Oct 2020
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
E. Moroshko
Suriya Gunasekar
Blake E. Woodworth
Jason D. Lee
Nathan Srebro
Daniel Soudry
73
86
0
13 Jul 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
178
95
0
15 Jun 2020
Directional convergence and alignment in deep learning
Ziwei Ji
Matus Telgarsky
59
171
0
11 Jun 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
54
21
0
15 May 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
202
241
0
04 Mar 2020
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
Lénaïc Chizat
Francis R. Bach
MLT
141
341
0
11 Feb 2020
Deep Double Descent: Where Bigger Models and More Data Hurt
Preetum Nakkiran
Gal Kaplun
Yamini Bansal
Tristan Yang
Boaz Barak
Ilya Sutskever
123
945
0
04 Dec 2019
An Exponential Learning Rate Schedule for Deep Learning
Zhiyuan Li
Sanjeev Arora
54
219
0
16 Oct 2019
Implicit Regularization for Optimal Sparse Recovery
Tomas Vaskevicius
Varun Kanade
Patrick Rebeschini
49
103
0
11 Sep 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
58
299
0
10 Jul 2019
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Kaifeng Lyu
Jian Li
98
336
0
13 Jun 2019
Kernel and Rich Regimes in Overparametrized Models
Blake E. Woodworth
Suriya Gunasekar
Pedro H. P. Savarese
E. Moroshko
Itay Golan
Jason D. Lee
Daniel Soudry
Nathan Srebro
82
366
0
13 Jun 2019
Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process
Guy Blanc
Neha Gupta
Gregory Valiant
Paul Valiant
147
147
0
19 Apr 2019
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
839
0
19 Dec 2018
Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations
Qianxiao Li
Cheng Tai
E. Weinan
104
150
0
05 Nov 2018
Three Mechanisms of Weight Decay Regularization
Guodong Zhang
Chaoqi Wang
Bowen Xu
Roger C. Grosse
67
259
0
29 Oct 2018
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
150
612
0
01 Jun 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
263
3,485
0
09 Mar 2018
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
92
119
0
24 Feb 2018
The Implicit Bias of Gradient Descent on Separable Data
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
163
924
0
27 Oct 2017
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
138
774
0
15 Mar 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
351
4,635
0
10 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
431
2,945
0
15 Sep 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
1
2
Next