Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1710.10345
Cited By
The Implicit Bias of Gradient Descent on Separable Data
27 October 2017
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Implicit Bias of Gradient Descent on Separable Data"
50 / 244 papers shown
Title
Embedding principle of homogeneous neural network for classification problem
Jiahan Zhang
Tao Luo
Yaoyu Zhang
9
0
0
18 May 2025
An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models
Jialin Mao
Itay Griniasty
Yan Sun
Mark K. Transtrum
James P. Sethna
Pratik Chaudhari
29
0
0
13 May 2025
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias
Yura Malitsky
Alexander Posch
27
0
0
05 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
55
0
0
02 May 2025
Gradient Descent as a Shrinkage Operator for Spectral Bias
Simon Lucey
38
0
0
25 Apr 2025
Weight Ensembling Improves Reasoning in Language Models
Xingyu Dang
Christina Baek
Kaiyue Wen
Zico Kolter
Aditi Raghunathan
MoMe
LRM
65
1
0
14 Apr 2025
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Chenyang Zhang
Peifeng Gao
Difan Zou
Yuan Cao
OOD
MLT
68
0
0
11 Apr 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
33
0
0
05 Apr 2025
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao
Tina Behnia
V. Vakilian
Christos Thrampoulidis
68
9
0
20 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks
Sholom Schechtman
Nicolas Schreuder
212
0
0
08 Feb 2025
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi
Antonio Orvieto
Seyed-Mohsen Moosavi-Dezfooli
AI4CE
AAML
198
0
0
15 Oct 2024
Variational Search Distributions
Daniel M. Steinberg
Rafael Oliveira
Cheng Soon Ong
Edwin V. Bonilla
33
0
0
10 Sep 2024
Input Space Mode Connectivity in Deep Neural Networks
Jakub Vrabel
Ori Shem-Ur
Yaron Oz
David Krueger
58
1
0
09 Sep 2024
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
39
0
0
04 Jul 2024
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Subhojyoti Mukherjee
Josiah P. Hanna
Qiaomin Xie
Robert Nowak
84
2
0
07 Jun 2024
When does compositional structure yield compositional generalization? A kernel theory
Samuel Lippl
Kim Stachenfeld
NAI
CoGe
73
6
0
26 May 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
50
13
0
05 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He
Christina Lee Yu
38
5
0
04 Apr 2024
Understanding the Double Descent Phenomenon in Deep Learning
Marc Lafon
Alexandre Thomas
25
2
0
15 Mar 2024
Neural Redshift: Random Networks are not Random Functions
Damien Teney
A. Nicolicioiu
Valentin Hartmann
Ehsan Abbasnejad
103
19
0
04 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
39
25
0
29 Feb 2024
Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features
Tina Behnia
Christos Thrampoulidis
SSL
39
0
0
29 Feb 2024
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
Muthuraman Chidambaram
Rong Ge
AAML
35
0
0
10 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
37
15
0
08 Feb 2024
Fast and Exact Enumeration of Deep Networks Partitions Regions
Randall Balestriero
Yann LeCun
18
5
0
20 Jan 2024
An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification
Hyenkyun Woo
20
0
0
26 Dec 2023
Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges
Peifeng Gao
Qianqian Xu
Yibo Yang
Peisong Wen
Huiyang Shao
Zhiyong Yang
Guohao Li
Qingming Huang
AAML
31
3
0
12 Oct 2023
Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods
A. Ma
Yangchen Pan
Amir-massoud Farahmand
AAML
25
5
0
13 Aug 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
38
26
0
20 Jul 2023
Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses
G. Buzaglo
Niv Haim
Gilad Yehudai
Gal Vardi
Yakir Oz
Yaniv Nikankin
Michal Irani
34
10
0
04 Jul 2023
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models
David X. Wu
A. Sahai
29
2
0
23 Jun 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
Yuan Cao
Difan Zou
Yuan-Fang Li
Quanquan Gu
MLT
37
5
0
20 Jun 2023
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage
Yu Gui
Cong Ma
Yiqiao Zhong
25
7
0
06 Jun 2023
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff
Arthur Jacot
MLT
26
13
0
30 May 2023
Estimating class separability of text embeddings with persistent homology
Kostis Gourgoulias
Najah F. Ghalyan
Maxime Labonne
Yash Satsangi
Sean J. Moran
Joseph Sabelja
40
0
0
24 May 2023
Fast Convergence in Learning Two-Layer Neural Networks with Separable Data
Hossein Taheri
Christos Thrampoulidis
MLT
16
3
0
22 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
32
17
0
19 May 2023
Robust Implicit Regularization via Weight Normalization
H. Chou
Holger Rauhut
Rachel A. Ward
38
7
0
09 May 2023
Do deep neural networks have an inbuilt Occam's razor?
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
21
16
0
13 Apr 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Nicolas Flammarion
33
35
0
02 Apr 2023
On the Stepwise Nature of Self-Supervised Learning
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
37
30
0
27 Mar 2023
General Loss Functions Lead to (Approximate) Interpolation in High Dimensions
Kuo-Wei Lai
Vidya Muthukumar
31
5
0
13 Mar 2023
Penalising the biases in norm regularisation enforces sparsity
Etienne Boursier
Nicolas Flammarion
40
14
0
02 Mar 2023
Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent
Avrajit Ghosh
He Lyu
Xitong Zhang
Rongrong Wang
53
21
0
02 Feb 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
Emmanuel Abbe
Samy Bengio
Aryo Lotfi
Kevin Rizk
LRM
41
49
0
30 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Jikai Jin
Zhiyuan Li
Kaifeng Lyu
S. Du
Jason D. Lee
MLT
54
34
0
27 Jan 2023
A Stability Analysis of Fine-Tuning a Pre-Trained Model
Z. Fu
Anthony Man-Cho So
Nigel Collier
23
3
0
24 Jan 2023
Convergence beyond the over-parameterized regime using Rayleigh quotients
David A. R. Robin
Kevin Scaman
Marc Lelarge
27
3
0
19 Jan 2023
Understanding Difficulty-based Sample Weighting with a Universal Difficulty Measure
Xiaoling Zhou
Ou Wu
Weiyao Zhu
Ziyang Liang
44
2
0
12 Jan 2023
Iterative regularization in classification via hinge loss diagonal descent
Vassilis Apidopoulos
T. Poggio
Lorenzo Rosasco
S. Villa
29
2
0
24 Dec 2022
1
2
3
4
5
Next