ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.11604
  4. Cited By
SGD on Neural Networks Learns Functions of Increasing Complexity

SGD on Neural Networks Learns Functions of Increasing Complexity

28 May 2019
Preetum Nakkiran
Gal Kaplun
Dimitris Kalimeris
Tristan Yang
Benjamin L. Edelman
Fred Zhang
Boaz Barak
    MLT
ArXivPDFHTML

Papers citing "SGD on Neural Networks Learns Functions of Increasing Complexity"

50 / 61 papers shown
Title
A distributional simplicity bias in the learning dynamics of transformers
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
79
8
0
17 Feb 2025
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Aditya Chinchure
Sahithya Ravi
R. Ng
Vered Shwartz
Boyang Albert Li
Leonid Sigal
ReLM
LRM
VLM
82
2
0
07 Dec 2024
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Dun Zeng
Zheshun Wu
Shiyu Liu
Yu Pan
Xiaoying Tang
Zenglin Xu
MLT
FedML
89
1
0
25 Nov 2024
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi
Antonio Orvieto
Seyed-Mohsen Moosavi-Dezfooli
AI4CE
AAML
219
0
0
15 Oct 2024
SHAP values via sparse Fourier representation
SHAP values via sparse Fourier representation
Ali Gorji
Andisheh Amrollahi
A. Krause
FAtt
38
0
0
08 Oct 2024
Task Diversity Shortens the ICL Plateau
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
41
2
0
07 Oct 2024
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Nikolaos Dimitriadis
Pascal Frossard
François Fleuret
MoE
69
6
0
10 Jul 2024
Learned feature representations are biased by complexity, learning
  order, position, and more
Learned feature representations are biased by complexity, learning order, position, and more
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Katherine Hermann
AI4CE
FaML
SSL
OOD
40
6
0
09 May 2024
Complexity Matters: Dynamics of Feature Learning in the Presence of
  Spurious Correlations
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
GuanWen Qiu
Da Kuang
Surbhi Goel
34
8
0
05 Mar 2024
Changing the Kernel During Training Leads to Double Descent in Kernel Regression
Changing the Kernel During Training Leads to Double Descent in Kernel Regression
Oskar Allerbo
38
0
0
03 Nov 2023
How Spurious Features Are Memorized: Precise Analysis for Random and NTK
  Features
How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features
Simone Bombari
Marco Mondelli
AAML
42
5
0
20 May 2023
Do deep neural networks have an inbuilt Occam's razor?
Do deep neural networks have an inbuilt Occam's razor?
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
28
16
0
13 Apr 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Nicolas Flammarion
42
35
0
02 Apr 2023
Learning time-scales in two-layers neural networks
Learning time-scales in two-layers neural networks
Raphael Berthier
Andrea Montanari
Kangjie Zhou
41
33
0
28 Feb 2023
Do Neural Networks Generalize from Self-Averaging Sub-classifiers in the
  Same Way As Adaptive Boosting?
Do Neural Networks Generalize from Self-Averaging Sub-classifiers in the Same Way As Adaptive Boosting?
Michael Sun
Peter Chatain
AI4CE
29
0
0
14 Feb 2023
A Mathematical Model for Curriculum Learning for Parities
A Mathematical Model for Curriculum Learning for Parities
Elisabetta Cornacchia
Elchanan Mossel
47
10
0
31 Jan 2023
Supervision Complexity and its Role in Knowledge Distillation
Supervision Complexity and its Role in Knowledge Distillation
Hrayr Harutyunyan
A. S. Rawat
A. Menon
Seungyeon Kim
Surinder Kumar
32
12
0
28 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained
  Analysis of Matrix Sensing
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Jikai Jin
Zhiyuan Li
Kaifeng Lyu
S. Du
Jason D. Lee
MLT
54
34
0
27 Jan 2023
Simplicity Bias in Transformers and their Ability to Learn Sparse
  Boolean Functions
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
S. Bhattamishra
Arkil Patel
Varun Kanade
Phil Blunsom
22
46
0
22 Nov 2022
Mechanistic Mode Connectivity
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
34
45
0
15 Nov 2022
MaskTune: Mitigating Spurious Correlations by Forcing to Explore
MaskTune: Mitigating Spurious Correlations by Forcing to Explore
Saeid Asgari Taghanaki
Aliasghar Khani
Fereshte Khani
A. Gholami
Linh-Tam Tran
Ali Mahdavi-Amiri
Ghassan Hamarneh
AAML
46
45
0
30 Sep 2022
Lazy vs hasty: linearization in deep networks impacts learning schedule
  based on example difficulty
Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty
Thomas George
Guillaume Lajoie
A. Baratin
34
5
0
19 Sep 2022
How Robust is Unsupervised Representation Learning to Distribution
  Shift?
How Robust is Unsupervised Representation Learning to Distribution Shift?
Yuge Shi
Imant Daunhawer
Julia E. Vogt
Philip Torr
Amartya Sanyal
OOD
40
25
0
17 Jun 2022
Learning Dynamics and Generalization in Reinforcement Learning
Learning Dynamics and Generalization in Reinforcement Learning
Clare Lyle
Mark Rowland
Will Dabney
Marta Z. Kwiatkowska
Y. Gal
OOD
OffRL
30
12
0
05 Jun 2022
Gradient flow dynamics of shallow ReLU networks for square loss and
  orthogonal inputs
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
Etienne Boursier
Loucas Pillaud-Vivien
Nicolas Flammarion
ODL
29
58
0
02 Jun 2022
Linear Connectivity Reveals Generalization Strategies
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
244
45
0
24 May 2022
Sobolev Acceleration and Statistical Optimality for Learning Elliptic
  Equations via Gradient Descent
Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent
Yiping Lu
Jose H. Blanchet
Lexing Ying
38
7
0
15 May 2022
Robust Training under Label Noise by Over-parameterization
Robust Training under Label Noise by Over-parameterization
Sheng Liu
Zhihui Zhu
Qing Qu
Chong You
NoLa
OOD
32
106
0
28 Feb 2022
Deconstructing Distributions: A Pointwise Framework of Learning
Deconstructing Distributions: A Pointwise Framework of Learning
Gal Kaplun
Nikhil Ghosh
Saurabh Garg
Boaz Barak
Preetum Nakkiran
OOD
38
21
0
20 Feb 2022
On the Origins of the Block Structure Phenomenon in Neural Network
  Representations
On the Origins of the Block Structure Phenomenon in Neural Network Representations
Thao Nguyen
M. Raghu
Simon Kornblith
30
14
0
15 Feb 2022
Fortuitous Forgetting in Connectionist Networks
Fortuitous Forgetting in Connectionist Networks
Hattie Zhou
Ankit Vani
Hugo Larochelle
Aaron Courville
CLL
16
42
0
01 Feb 2022
Overview frequency principle/spectral bias in deep learning
Overview frequency principle/spectral bias in deep learning
Z. Xu
Tao Luo
Yaoyu Zhang
FaML
35
66
0
19 Jan 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Mohammad Pezeshki
Amartya Mitra
Yoshua Bengio
Guillaume Lajoie
61
25
0
06 Dec 2021
Embedding Principle: a hierarchical structure of loss landscape of deep
  neural networks
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Tao Luo
Yuqing Li
Zhongwang Zhang
Yaoyu Zhang
Z. Xu
29
22
0
30 Nov 2021
MedRDF: A Robust and Retrain-Less Diagnostic Framework for Medical
  Pretrained Models Against Adversarial Attack
MedRDF: A Robust and Retrain-Less Diagnostic Framework for Medical Pretrained Models Against Adversarial Attack
Mengting Xu
Tao Zhang
Daoqiang Zhang
AAML
MedIm
23
23
0
29 Nov 2021
Spectral Bias in Practice: The Role of Function Frequency in
  Generalization
Spectral Bias in Practice: The Role of Function Frequency in Generalization
Sara Fridovich-Keil
Raphael Gontijo-Lopes
Rebecca Roelofs
41
28
0
06 Oct 2021
Fishr: Invariant Gradient Variances for Out-of-Distribution
  Generalization
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization
Alexandre Ramé
Corentin Dancette
Matthieu Cord
OOD
47
205
0
07 Sep 2021
SplitGuard: Detecting and Mitigating Training-Hijacking Attacks in Split
  Learning
SplitGuard: Detecting and Mitigating Training-Hijacking Attacks in Split Learning
Ege Erdogan
Alptekin Kupcu
A. E. Cicek
AAML
27
32
0
20 Aug 2021
Deep Learning Through the Lens of Example Difficulty
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
47
156
0
17 Jun 2021
Learning distinct features helps, provably
Learning distinct features helps, provably
Firas Laakom
Jenni Raitoharju
Alexandros Iosifidis
Moncef Gabbouj
MLT
36
6
0
10 Jun 2021
FEAR: A Simple Lightweight Method to Rank Architectures
FEAR: A Simple Lightweight Method to Rank Architectures
Debadeepta Dey
Shital C. Shah
Sébastien Bubeck
OOD
30
4
0
07 Jun 2021
A Little Robustness Goes a Long Way: Leveraging Robust Features for
  Targeted Transfer Attacks
A Little Robustness Goes a Long Way: Leveraging Robust Features for Targeted Transfer Attacks
Jacob Mitchell Springer
Melanie Mitchell
Garrett Kenyon
AAML
31
43
0
03 Jun 2021
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
  Solutions with Superior OOD Generalization
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization
Damien Teney
Ehsan Abbasnejad
Simon Lucey
Anton Van Den Hengel
51
87
0
12 May 2021
RATT: Leveraging Unlabeled Data to Guarantee Generalization
RATT: Leveraging Unlabeled Data to Guarantee Generalization
Saurabh Garg
Sivaraman Balakrishnan
J. Zico Kolter
Zachary Chase Lipton
32
30
0
01 May 2021
A neural anisotropic view of underspecification in deep learning
A neural anisotropic view of underspecification in deep learning
Guillermo Ortiz-Jiménez
I. Salazar-Reque
Apostolos Modas
Seyed-Mohsen Moosavi-Dezfooli
P. Frossard
34
6
0
29 Apr 2021
The Low-Rank Simplicity Bias in Deep Networks
The Low-Rank Simplicity Bias in Deep Networks
Minyoung Huh
H. Mobahi
Richard Y. Zhang
Brian Cheung
Pulkit Agrawal
Phillip Isola
35
110
0
18 Mar 2021
Do Input Gradients Highlight Discriminative Features?
Do Input Gradients Highlight Discriminative Features?
Harshay Shah
Prateek Jain
Praneeth Netrapalli
AAML
FAtt
28
57
0
25 Feb 2021
Provable Generalization of SGD-trained Neural Networks of Any Width in
  the Presence of Adversarial Label Noise
Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise
Spencer Frei
Yuan Cao
Quanquan Gu
FedML
MLT
70
19
0
04 Jan 2021
Gradient Starvation: A Learning Proclivity in Neural Networks
Gradient Starvation: A Learning Proclivity in Neural Networks
Mohammad Pezeshki
Sekouba Kaba
Yoshua Bengio
Aaron Courville
Doina Precup
Guillaume Lajoie
MLT
52
259
0
18 Nov 2020
A Bayesian Perspective on Training Speed and Model Selection
A Bayesian Perspective on Training Speed and Model Selection
Clare Lyle
Lisa Schut
Binxin Ru
Y. Gal
Mark van der Wilk
44
24
0
27 Oct 2020
12
Next