Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.06559
Cited By
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
18 December 2017
Siyuan Ma
Raef Bassily
M. Belkin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning"
50 / 76 papers shown
Title
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
2
0
0
19 May 2025
Better Rates for Random Task Orderings in Continual Linear Models
Itay Evron
Ran Levinstein
Matan Schliserman
Uri Sherman
Tomer Koren
Daniel Soudry
Nathan Srebro
CLL
35
0
0
06 Apr 2025
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
83
8
0
29 Oct 2024
Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models
Matteo Lapucci
Davide Pucci
37
1
0
06 Aug 2024
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
Antonio Orvieto
Lin Xiao
45
3
0
05 Jul 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
A. Banerjee
Qiaobo Li
Yingxue Zhou
52
0
0
11 Jun 2024
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou
Nicolas Loizou
55
4
0
06 Jun 2024
Demystifying SGD with Doubly Stochastic Gradients
Kyurae Kim
Joohwan Ko
Yian Ma
Jacob R. Gardner
53
0
0
03 Jun 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
66
1
0
03 Apr 2024
Useful Compact Representations for Data-Fitting
Johannes J Brust
15
0
0
18 Mar 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
47
1
0
29 Nov 2023
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
Samuel Horváth
Stefanos Laskaridis
Shashank Rajput
Hongyi Wang
BDL
32
4
0
28 Aug 2023
A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks
Vignesh Kothapalli
Tom Tirer
Joan Bruna
38
11
0
04 Jul 2023
Towards understanding neural collapse in supervised contrastive learning with the information bottleneck method
Siwei Wang
S. Palmer
32
2
0
19 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
35
10
0
12 May 2023
Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting
Yitzchak Shmalo
Jonathan Jenkins
Oleksii Krupchytskyi
30
3
0
15 Mar 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks
Luca Arnaboldi
Ludovic Stephan
Florent Krzakala
Bruno Loureiro
MLT
38
31
0
12 Feb 2023
Private optimization in the interpolation regime: faster rates and hardness results
Hilal Asi
Karan N. Chadha
Gary Cheng
John C. Duchi
47
5
0
31 Oct 2022
Perturbation Analysis of Neural Collapse
Tom Tirer
Haoxiang Huang
Jonathan Niles-Weed
AAML
48
24
0
29 Oct 2022
The Curious Case of Benign Memorization
Sotiris Anagnostidis
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
AAML
49
8
0
25 Oct 2022
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale
Ran Tian
Ankur P. Parikh
ODL
23
6
0
21 Oct 2022
Stability of Accuracy for the Training of DNNs Via the Uniform Doubling Condition
Yitzchak Shmalo
28
1
0
16 Oct 2022
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
19
11
0
16 Aug 2022
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark W. Schmidt
OffRL
21
6
0
29 Jul 2022
SP2: A Second Order Stochastic Polyak Method
Shuang Li
W. Swartworth
Martin Takávc
Deanna Needell
Robert Mansel Gower
26
13
0
17 Jul 2022
Beyond Uniform Lipschitz Condition in Differentially Private Optimization
Rudrajit Das
Satyen Kale
Zheng Xu
Tong Zhang
Sujay Sanghavi
26
17
0
21 Jun 2022
On the fast convergence of minibatch heavy ball momentum
Raghu Bollapragada
Tyler Chen
Rachel A. Ward
29
17
0
15 Jun 2022
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms
Lam M. Nguyen
Trang H. Tran
32
2
0
13 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization
Vignesh Kothapalli
25
74
0
08 Jun 2022
The Directional Bias Helps Stochastic Gradient Descent to Generalize in Kernel Regression Models
Yiling Luo
X. Huo
Y. Mei
21
0
0
29 Apr 2022
Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD
Konstantinos E. Nikolakakis
Farzin Haddadpour
Amin Karbasi
Dionysios S. Kalogerias
43
17
0
26 Apr 2022
Random matrix analysis of deep neural network weight matrices
M. Thamm
Max Staats
B. Rosenow
35
12
0
28 Mar 2022
Benchmark Assessment for DeepSpeed Optimization Library
G. Liang
I. Alsmadi
34
3
0
12 Feb 2022
A Stochastic Bundle Method for Interpolating Networks
Alasdair Paren
Leonard Berrada
Rudra P. K. Poudel
M. P. Kumar
24
4
0
29 Jan 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
Vitaly Kurin
Alessandro De Palma
Ilya Kostrikov
Shimon Whiteson
M. P. Kumar
39
74
0
11 Jan 2022
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
Sofia Broomé
Ernest Pokropek
Boyu Li
Hedvig Kjellström
21
7
0
22 Dec 2021
On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons
Fangshuo Liao
Anastasios Kyrillidis
46
16
0
05 Dec 2021
Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize
Ryan DÓrazio
Nicolas Loizou
I. Laradji
Ioannis Mitliagkas
34
30
0
28 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
26
31
0
25 Jun 2021
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
A Geometric Analysis of Neural Collapse with Unconstrained Features
Zhihui Zhu
Tianyu Ding
Jinxin Zhou
Xiao Li
Chong You
Jeremias Sulam
Qing Qu
33
194
0
06 May 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
29
66
0
21 Apr 2021
SVRG Meets AdaGrad: Painless Variance Reduction
Benjamin Dubois-Taine
Sharan Vaswani
Reza Babanezhad
Mark W. Schmidt
Simon Lacoste-Julien
18
18
0
18 Feb 2021
On Riemannian Stochastic Approximation Schemes with Fixed Step-Size
Alain Durmus
P. Jiménez
Eric Moulines
Salem Said
29
12
0
15 Feb 2021
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training
Cong Fang
Hangfeng He
Qi Long
Weijie J. Su
FAtt
130
167
0
29 Jan 2021
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization
Abolfazl Hashemi
Anish Acharya
Rudrajit Das
H. Vikalo
Sujay Sanghavi
Inderjit Dhillon
20
7
0
20 Nov 2020
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
53
405
0
28 Oct 2020
Prevalence of Neural Collapse during the terminal phase of deep learning training
Vardan Papyan
Xuemei Han
D. Donoho
35
549
0
18 Aug 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
27
37
0
09 Jul 2020
1
2
Next