v1v2 (latest)

SGD with memory: fundamental properties and stochastic acceleration

5 October 2024

Papers citing "SGD with memory: fundamental properties and stochastic acceleration"

34 / 34 papers shown

Title
Corner Gradient Descent Dmitry Yarotsky 91 0 0 16 Apr 2025
Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models Elizabeth Collins-Woodfin Courtney Paquette Elliot Paquette Inbar Seroussi 47 16 0 17 Aug 2023
A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta Maksim Velikanov Denis Kuznedelev Dmitry Yarotsky 75 8 0 22 Jun 2022
Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions Kiwon Lee Andrew N. Cheng Courtney Paquette Elliot Paquette 87 14 0 02 Jun 2022
Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties Courtney Paquette Elliot Paquette Ben Adlam Jeffrey Pennington 59 22 0 14 May 2022
More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize Alexander Wei Wei Hu Jacob Steinhardt 109 72 0 11 Mar 2022
Accelerated SGD for Non-Strongly-Convex Least Squares Aditya Varre Nicolas Flammarion 68 7 0 03 Mar 2022
Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions Maksim Velikanov Dmitry Yarotsky 95 8 0 02 Feb 2022
Neural Networks as Kernel Learners: The Silent Alignment Effect Alexander B. Atanasov Blake Bordelon Cengiz Pehlevan MLT 116 85 0 29 Oct 2021
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression Jingfeng Wu Difan Zou Vladimir Braverman Quanquan Gu Sham Kakade 168 22 0 12 Oct 2021
What can linearized neural networks actually say about generalization? Guillermo Ortiz-Jiménez Seyed-Mohsen Moosavi-Dezfooli P. Frossard 79 45 0 12 Jun 2021
Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models Courtney Paquette Elliot Paquette ODL 98 14 0 07 Jun 2021
Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime Hugo Cui Bruno Loureiro Florent Krzakala Lenka Zdeborová 88 85 0 31 May 2021
Benign Overfitting of Constant-Stepsize SGD for Linear Regression Difan Zou Jingfeng Wu Vladimir Braverman Quanquan Gu Sham Kakade 80 64 0 23 Mar 2021
Fast Adaptation with Linearized Neural Networks Wesley J. Maddox Shuai Tang Pablo G. Moreno A. Wilson Andreas C. Damianou 85 32 0 02 Mar 2021
Approximation and Learning with Deep Convolutional Models: a Kernel Perspective A. Bietti 87 30 0 19 Feb 2021
Explaining Neural Scaling Laws Yasaman Bahri Ethan Dyer Jared Kaplan Jaehoon Lee Utkarsh Sharma 91 270 0 12 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality Courtney Paquette Kiwon Lee Fabian Pedregosa Elliot Paquette 59 35 0 08 Feb 2021
Last iterate convergence of SGD for Least-Squares in the Interpolation regime Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 100 36 0 05 Feb 2021
Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel Stanislav Fort Gintare Karolina Dziugaite Mansheej Paul Sepideh Kharaghani Daniel M. Roy Surya Ganguli 109 193 0 28 Oct 2020
Finite Versus Infinite Neural Networks: an Empirical Study Jaehoon Lee S. Schoenholz Jeffrey Pennington Ben Adlam Lechao Xiao Roman Novak Jascha Narain Sohl-Dickstein 87 214 0 31 Jul 2020
Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks Abdulkadir Canatar Blake Bordelon Cengiz Pehlevan 156 190 0 23 Jun 2020
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model Raphael Berthier Francis R. Bach Pierre Gaillard 72 39 0 15 Jun 2020
Frequency Bias in Neural Networks for Input of Non-Uniform Density Ronen Basri Meirav Galun Amnon Geifman David Jacobs Yoni Kasten S. Kritchman 92 186 0 10 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 218 241 0 04 Mar 2020
A Fine-Grained Spectral Perspective on Neural Networks Greg Yang Hadi Salman 123 113 0 24 Jul 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent Jaehoon Lee Lechao Xiao S. Schoenholz Yasaman Bahri Roman Novak Jascha Narain Sohl-Dickstein Jeffrey Pennington 218 1,112 0 18 Feb 2019
Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits Xialiang Dou Tengyuan Liang MLT 83 42 0 21 Jan 2019
On Lazy Training in Differentiable Programming Lénaïc Chizat Edouard Oyallon Francis R. Bach 111 840 0 19 Dec 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 346 3,226 0 20 Jun 2018
Stochastic Composite Least-Squares Regression with convergence rate O(1/n) Nicolas Flammarion Francis R. Bach 89 27 0 21 Feb 2017
Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression Aymeric Dieuleveut Nicolas Flammarion Francis R. Bach ODL 97 227 0 17 Feb 2016
From Averaging to Acceleration, There is Only a Step-size Nicolas Flammarion Francis R. Bach 114 139 0 07 Apr 2015
Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) Francis R. Bach Eric Moulines 125 405 0 10 Jun 2013