The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

18 December 2017

Papers citing "The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning"

50 / 76 papers shown

Title
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training Shane Bergsma Nolan Dey Gurpreet Gosal Gavia Gray Daria Soboleva Joel Hestness 2 0 0 19 May 2025
Better Rates for Random Task Orderings in Continual Linear Models Itay Evron Ran Levinstein Matan Schliserman Uri Sherman Tomer Koren Daniel Soudry Nathan Srebro CLL 35 0 0 06 Apr 2025
How Does Critical Batch Size Scale in Pre-training? Hanlin Zhang Depen Morwani Nikhil Vyas Jingfeng Wu Difan Zou Udaya Ghai Dean Phillips Foster Sham Kakade 83 8 0 29 Oct 2024
Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models Matteo Lapucci Davide Pucci 37 1 0 06 Aug 2024
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes Antonio Orvieto Lin Xiao 45 3 0 05 Jul 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees A. Banerjee Qiaobo Li Yingxue Zhou 52 0 0 11 Jun 2024
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance Dimitris Oikonomou Nicolas Loizou 55 4 0 06 Jun 2024
Demystifying SGD with Doubly Stochastic Gradients Kyurae Kim Joohwan Ko Yian Ma Jacob R. Gardner 53 0 0 03 Jun 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation Aaron Mishkin Mert Pilanci Mark Schmidt 66 1 0 03 Apr 2024
Useful Compact Representations for Data-Fitting Johannes J Brust 15 0 0 18 Mar 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization Sungbin Shin Dongyeop Lee Maksym Andriushchenko Namhoon Lee AAML 47 1 0 29 Nov 2023
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition Samuel Horváth Stefanos Laskaridis Shashank Rajput Hongyi Wang BDL 32 4 0 28 Aug 2023
A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks Vignesh Kothapalli Tom Tirer Joan Bruna 38 11 0 04 Jul 2023
Towards understanding neural collapse in supervised contrastive learning with the information bottleneck method Siwei Wang S. Palmer 32 2 0 19 May 2023
MoMo: Momentum Models for Adaptive Learning Rates Fabian Schaipp Ruben Ohana Michael Eickenberg Aaron Defazio Robert Mansel Gower 35 10 0 12 May 2023
Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting Yitzchak Shmalo Jonathan Jenkins Oleksii Krupchytskyi 30 3 0 15 Mar 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks Luca Arnaboldi Ludovic Stephan Florent Krzakala Bruno Loureiro MLT 38 31 0 12 Feb 2023
Private optimization in the interpolation regime: faster rates and hardness results Hilal Asi Karan N. Chadha Gary Cheng John C. Duchi 47 5 0 31 Oct 2022
Perturbation Analysis of Neural Collapse Tom Tirer Haoxiang Huang Jonathan Niles-Weed AAML 48 24 0 29 Oct 2022
The Curious Case of Benign Memorization Sotiris Anagnostidis Gregor Bachmann Lorenzo Noci Thomas Hofmann AAML 49 8 0 25 Oct 2022
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale Ran Tian Ankur P. Parikh ODL 23 6 0 21 Oct 2022
Stability of Accuracy for the Training of DNNs Via the Uniform Doubling Condition Yitzchak Shmalo 28 1 0 16 Oct 2022
On the generalization of learning algorithms that do not converge N. Chandramoorthy Andreas Loukas Khashayar Gatmiry Stefanie Jegelka MLT 19 11 0 16 Aug 2022
Improved Policy Optimization for Online Imitation Learning J. Lavington Sharan Vaswani Mark W. Schmidt OffRL 21 6 0 29 Jul 2022
SP2: A Second Order Stochastic Polyak Method Shuang Li W. Swartworth Martin Takávc Deanna Needell Robert Mansel Gower 26 13 0 17 Jul 2022
Beyond Uniform Lipschitz Condition in Differentially Private Optimization Rudrajit Das Satyen Kale Zheng Xu Tong Zhang Sujay Sanghavi 26 17 0 21 Jun 2022
On the fast convergence of minibatch heavy ball momentum Raghu Bollapragada Tyler Chen Rachel A. Ward 29 17 0 15 Jun 2022
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen Trang H. Tran 32 2 0 13 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization Vignesh Kothapalli 25 74 0 08 Jun 2022
The Directional Bias Helps Stochastic Gradient Descent to Generalize in Kernel Regression Models Yiling Luo X. Huo Y. Mei 21 0 0 29 Apr 2022
Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD Konstantinos E. Nikolakakis Farzin Haddadpour Amin Karbasi Dionysios S. Kalogerias 43 17 0 26 Apr 2022
Random matrix analysis of deep neural network weight matrices M. Thamm Max Staats B. Rosenow 35 12 0 28 Mar 2022
Benchmark Assessment for DeepSpeed Optimization Library G. Liang I. Alsmadi 34 3 0 12 Feb 2022
A Stochastic Bundle Method for Interpolating Networks Alasdair Paren Leonard Berrada Rudra P. K. Poudel M. P. Kumar 24 4 0 29 Jan 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning Vitaly Kurin Alessandro De Palma Ilya Kostrikov Shimon Whiteson M. P. Kumar 39 74 0 11 Jan 2022
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition Sofia Broomé Ernest Pokropek Boyu Li Hedvig Kjellström 21 7 0 22 Dec 2021
On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons Fangshuo Liao Anastasios Kyrillidis 46 16 0 05 Dec 2021
Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize Ryan DÓrazio Nicolas Loizou I. Laradji Ioannis Mitliagkas 34 30 0 28 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
Implicit Gradient Alignment in Distributed and Federated Learning Yatin Dandi Luis Barba Martin Jaggi FedML 26 31 0 25 Jun 2021
On Large-Cohort Training for Federated Learning Zachary B. Charles Zachary Garrett Zhouyuan Huo Sergei Shmulyian Virginia Smith FedML 21 113 0 15 Jun 2021
A Geometric Analysis of Neural Collapse with Unconstrained Features Zhihui Zhu Tianyu Ding Jinxin Zhou Xiao Li Chong You Jeremias Sulam Qing Qu 33 194 0 06 May 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu Chen Jiamin Ni Songtao Lu Xiaodong Cui Pin-Yu Chen ... Naigang Wang Swagath Venkataramani Vijayalakshmi Srinivasan Wei Zhang K. Gopalakrishnan 29 66 0 21 Apr 2021
SVRG Meets AdaGrad: Painless Variance Reduction Benjamin Dubois-Taine Sharan Vaswani Reza Babanezhad Mark W. Schmidt Simon Lacoste-Julien 18 18 0 18 Feb 2021
On Riemannian Stochastic Approximation Schemes with Fixed Step-Size Alain Durmus P. Jiménez Eric Moulines Salem Said 29 12 0 15 Feb 2021
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training Cong Fang Hangfeng He Qi Long Weijie J. Su FAtt 130 167 0 29 Jan 2021
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization Abolfazl Hashemi Anish Acharya Rudrajit Das H. Vikalo Sujay Sanghavi Inderjit Dhillon 20 7 0 20 Nov 2020
Scaling Laws for Autoregressive Generative Modeling T. Henighan Jared Kaplan Mor Katz Mark Chen Christopher Hesse ... Nick Ryder Daniel M. Ziegler John Schulman Dario Amodei Sam McCandlish 53 405 0 28 Oct 2020
Prevalence of Neural Collapse during the terminal phase of deep learning training Vardan Papyan Xuemei Han D. Donoho 35 549 0 18 Aug 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training Tyler B. Johnson Pulkit Agrawal Haijie Gu Carlos Guestrin ODL 27 37 0 09 Jul 2020