Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

8 March 2024

Papers citing "Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks"

21 / 21 papers shown

Title
Optimization Insights into Deep Diagonal Linear Networks Hippolyte Labarrière C. Molinari Lorenzo Rosasco S. Villa Cristian Vega 173 1 0 21 Dec 2024
Towards understanding how momentum improves generalization in deep learning Samy Jelassi Yuanzhi Li ODL MLT AI4CE 72 37 0 13 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent Zhiyuan Li Tianhao Wang Jason D. Lee Sanjeev Arora 82 29 0 08 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation Loucas Pillaud-Vivien J. Reygner Nicolas Flammarion NoLa 80 34 0 20 Jun 2022
PaLM: Scaling Language Modeling with Pathways Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra ... Kathy Meier-Hellstern Douglas Eck J. Dean Slav Petrov Noah Fiedel PILM LRM 515 6,293 0 05 Apr 2022
Training Compute-Optimal Large Language Models Jordan Hoffmann Sebastian Borgeaud A. Mensch Elena Buchatskaya Trevor Cai ... Karen Simonyan Erich Elsen Jack W. Rae Oriol Vinyals Laurent Sifre AI4TS 208 1,976 0 29 Mar 2022
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity Scott Pesme Loucas Pillaud-Vivien Nicolas Flammarion 56 106 0 17 Jun 2021
Shape Matters: Understanding the Implicit Bias of the Noise Covariance Jeff Z. HaoChen Colin Wei Jason D. Lee Tengyu Ma 168 95 0 15 Jun 2020
The Two Regimes of Deep Network Training Guillaume Leclerc Aleksander Madry 72 45 0 24 Feb 2020
Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization Vien V. Mai M. Johansson 67 56 0 13 Feb 2020
Implicit Regularization for Optimal Sparse Recovery Tomas Vaskevicius Varun Kanade Patrick Rebeschini 49 103 0 11 Sep 2019
The Role of Memory in Stochastic Optimization Antonio Orvieto Jonas Köhler Aurelien Lucchi 68 30 0 02 Jul 2019
Kernel and Rich Regimes in Overparametrized Models Blake E. Woodworth Suriya Gunasekar Pedro H. P. Savarese E. Moroshko Itay Golan Jason D. Lee Daniel Soudry Nathan Srebro 80 366 0 13 Jun 2019
Implicit Regularization in Deep Matrix Factorization Sanjeev Arora Nadav Cohen Wei Hu Yuping Luo AI4CE 85 509 0 31 May 2019
Exponentiated Gradient Meets Gradient Descent Udaya Ghai Elad Hazan Y. Singer 70 47 0 05 Feb 2019
Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances Bugra Can Mert Gurbuzbalaban Lingjiong Zhu 63 45 0 22 Jan 2019
Understanding the Acceleration Phenomenon via High-Resolution Differential Equations Bin Shi S. Du Michael I. Jordan Weijie J. Su 61 262 0 21 Oct 2018
On the insufficiency of existing momentum schemes for Stochastic Optimization Rahul Kidambi Praneeth Netrapalli Prateek Jain Sham Kakade ODL 88 119 0 15 Mar 2018
Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent Chi Jin Praneeth Netrapalli Michael I. Jordan ODL 72 263 0 28 Nov 2017
Understanding deep learning requires rethinking generalization Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht Oriol Vinyals HAI 345 4,636 0 10 Nov 2016
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights Weijie Su Stephen P. Boyd Emmanuel J. Candes 165 1,173 0 04 Mar 2015