ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.10345
  4. Cited By
The Implicit Bias of Gradient Descent on Separable Data

The Implicit Bias of Gradient Descent on Separable Data

27 October 2017
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
ArXivPDFHTML

Papers citing "The Implicit Bias of Gradient Descent on Separable Data"

50 / 244 papers shown
Title
Malign Overfitting: Interpolation Can Provably Preclude Invariance
Malign Overfitting: Interpolation Can Provably Preclude Invariance
Yoav Wald
G. Yona
Uri Shalit
Y. Carmon
17
6
0
28 Nov 2022
Mechanistic Mode Connectivity
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
34
45
0
15 Nov 2022
Regression as Classification: Influence of Task Formulation on Neural
  Network Features
Regression as Classification: Influence of Task Formulation on Neural Network Features
Lawrence Stewart
Francis R. Bach
Quentin Berthet
Jean-Philippe Vert
35
24
0
10 Nov 2022
Do highly over-parameterized neural networks generalize since bad
  solutions are rare?
Do highly over-parameterized neural networks generalize since bad solutions are rare?
Julius Martinetz
T. Martinetz
30
1
0
07 Nov 2022
Instance-Dependent Generalization Bounds via Optimal Transport
Instance-Dependent Generalization Bounds via Optimal Transport
Songyan Hou
Parnian Kassraie
Anastasis Kratsios
Andreas Krause
Jonas Rothfuss
22
6
0
02 Nov 2022
Grokking phase transitions in learning local rules with gradient descent
Grokking phase transitions in learning local rules with gradient descent
Bojan Žunkovič
E. Ilievski
63
16
0
26 Oct 2022
Interpolating Discriminant Functions in High-Dimensional Gaussian Latent
  Mixtures
Interpolating Discriminant Functions in High-Dimensional Gaussian Latent Mixtures
Xin Bing
M. Wegkamp
21
1
0
25 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
40
49
0
25 Oct 2022
Improving Out-of-Distribution Generalization by Adversarial Training
  with Structured Priors
Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors
Qixun Wang
Yifei Wang
Hong Zhu
Yisen Wang
OOD
22
19
0
13 Oct 2022
From Gradient Flow on Population Loss to Learning with Stochastic
  Gradient Descent
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Satyen Kale
Jason D. Lee
Chris De Sa
Ayush Sekhari
Karthik Sridharan
29
4
0
13 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
45
56
0
11 Oct 2022
Class-wise and reduced calibration methods
Class-wise and reduced calibration methods
Michael Panchenko
Anes Benmerzoug
Miguel de Benito Delgado
21
0
0
07 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For
  Correct Goals
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
40
68
0
04 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines
  and Drifting Towards Wide Minima
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
76
34
0
04 Oct 2022
Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in
  Neural Networks
Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks
Sravanti Addepalli
Anshul Nasery
R. Venkatesh Babu
Praneeth Netrapalli
Prateek Jain
AAML
38
3
0
04 Oct 2022
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear
  Functions
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions
Arthur Jacot
41
25
0
29 Sep 2022
Neural Networks Efficiently Learn Low-Dimensional Representations with
  SGD
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini
Sejun Park
M. Girotti
Ioannis Mitliagkas
Murat A. Erdogdu
MLT
324
48
0
29 Sep 2022
Magnitude and Angle Dynamics in Training Single ReLU Neurons
Magnitude and Angle Dynamics in Training Single ReLU Neurons
Sangmin Lee
Byeongsu Sim
Jong Chul Ye
MLT
96
6
0
27 Sep 2022
Approximate Description Length, Covering Numbers, and VC Dimension
Approximate Description Length, Covering Numbers, and VC Dimension
Amit Daniely
Gal Katzhendler
16
0
0
26 Sep 2022
A Validation Approach to Over-parameterized Matrix and Image Recovery
A Validation Approach to Over-parameterized Matrix and Image Recovery
Lijun Ding
Zhen Qin
Liwei Jiang
Jinxin Zhou
Zhihui Zhu
48
13
0
21 Sep 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Niladri S. Chatterji
Philip M. Long
23
8
0
19 Sep 2022
Importance Tempering: Group Robustness for Overparameterized Models
Importance Tempering: Group Robustness for Overparameterized Models
Yiping Lu
Wenlong Ji
Zachary Izzo
Lexing Ying
47
7
0
19 Sep 2022
On Generalization of Decentralized Learning with Separable Data
On Generalization of Decentralized Learning with Separable Data
Hossein Taheri
Christos Thrampoulidis
FedML
42
11
0
15 Sep 2022
Solving Elliptic Problems with Singular Sources using Singularity
  Splitting Deep Ritz Method
Solving Elliptic Problems with Singular Sources using Singularity Splitting Deep Ritz Method
Tianhao Hu
Bangti Jin
Zhi Zhou
31
6
0
07 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
34
72
0
26 Aug 2022
Intersection of Parallels as an Early Stopping Criterion
Intersection of Parallels as an Early Stopping Criterion
Ali Vardasbi
Maarten de Rijke
Mostafa Dehghani
MoMe
41
5
0
19 Aug 2022
On the generalization of learning algorithms that do not converge
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
19
11
0
16 Aug 2022
Imbalance Trouble: Revisiting Neural-Collapse Geometry
Imbalance Trouble: Revisiting Neural-Collapse Geometry
Christos Thrampoulidis
Ganesh Ramachandra Kini
V. Vakilian
Tina Behnia
35
69
0
10 Aug 2022
On the Activation Function Dependence of the Spectral Bias of Neural
  Networks
On the Activation Function Dependence of the Spectral Bias of Neural Networks
Q. Hong
Jonathan W. Siegel
Qinyan Tan
Jinchao Xu
34
23
0
09 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On
  Equivalence to Mirror Descent
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
45
27
0
08 Jul 2022
How many labelers do you have? A closer look at gold-standard labels
How many labelers do you have? A closer look at gold-standard labels
Chen Cheng
Hilal Asi
John C. Duchi
21
6
0
24 Jun 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso
  for quadratic parametrisation
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
33
31
0
20 Jun 2022
Reconstructing Training Data from Trained Neural Networks
Reconstructing Training Data from Trained Neural Networks
Niv Haim
Gal Vardi
Gilad Yehudai
Ohad Shamir
Michal Irani
40
132
0
15 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
45
71
0
14 Jun 2022
On the Convergence to a Global Solution of Shuffling-Type Gradient
  Algorithms
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms
Lam M. Nguyen
Trang H. Tran
32
2
0
13 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization
Neural Collapse: A Review on Modelling Principles and Generalization
Vignesh Kothapalli
30
74
0
08 Jun 2022
Adversarial Reprogramming Revisited
Adversarial Reprogramming Revisited
Matthias Englert
R. Lazic
AAML
29
9
0
07 Jun 2022
Building Robust Ensembles via Margin Boosting
Building Robust Ensembles via Margin Boosting
Dinghuai Zhang
Hongyang R. Zhang
Aaron Courville
Yoshua Bengio
Pradeep Ravikumar
A. Suggala
AAML
UQCV
48
15
0
07 Jun 2022
What do CNNs Learn in the First Layer and Why? A Linear Systems
  Perspective
What do CNNs Learn in the First Layer and Why? A Linear Systems Perspective
Rhea Chowers
Yair Weiss
33
2
0
06 Jun 2022
Gradient flow dynamics of shallow ReLU networks for square loss and
  orthogonal inputs
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
Etienne Boursier
Loucas Pillaud-Vivien
Nicolas Flammarion
ODL
27
58
0
02 Jun 2022
A Blessing of Dimensionality in Membership Inference through
  Regularization
A Blessing of Dimensionality in Membership Inference through Regularization
Jasper Tan
Daniel LeJeune
Blake Mason
Hamid Javadi
Richard G. Baraniuk
32
18
0
27 May 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised
  Learning
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
Zixin Wen
Yuanzhi Li
SSL
32
34
0
12 May 2022
Investigating Generalization by Controlling Normalized Margin
Investigating Generalization by Controlling Normalized Margin
Alexander R. Farhang
Jeremy Bernstein
Kushal Tirumala
Yang Liu
Yisong Yue
31
6
0
08 May 2022
Why adversarial training can hurt robust accuracy
Why adversarial training can hurt robust accuracy
Jacob Clarysse
Julia Hörrmann
Fanny Yang
AAML
13
18
0
03 Mar 2022
Robust Training under Label Noise by Over-parameterization
Robust Training under Label Noise by Over-parameterization
Sheng Liu
Zhihui Zhu
Qing Qu
Chong You
NoLa
OOD
32
106
0
28 Feb 2022
The Spectral Bias of Polynomial Neural Networks
The Spectral Bias of Polynomial Neural Networks
Moulik Choraria
L. Dadi
Grigorios G. Chrysos
Julien Mairal
V. Cevher
24
18
0
27 Feb 2022
Stability vs Implicit Bias of Gradient Methods on Separable Data and
  Beyond
Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond
Matan Schliserman
Tomer Koren
24
23
0
27 Feb 2022
Thinking Outside the Ball: Optimal Learning with Gradient Descent for
  Generalized Linear Stochastic Convex Optimization
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization
I Zaghloul Amir
Roi Livni
Nathan Srebro
32
6
0
27 Feb 2022
A Data-Augmentation Is Worth A Thousand Samples: Exact Quantification
  From Analytical Augmented Sample Moments
A Data-Augmentation Is Worth A Thousand Samples: Exact Quantification From Analytical Augmented Sample Moments
Randall Balestriero
Ishan Misra
Yann LeCun
35
20
0
16 Feb 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in
  Actor-Critic Algorithms
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Romain Laroche
Rémi Tachet des Combes
46
2
0
15 Feb 2022
Previous
12345
Next