ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05293
  4. Cited By
Leveraging Continuous Time to Understand Momentum When Training Diagonal
  Linear Networks

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

8 March 2024
Hristo Papazov
Scott Pesme
Nicolas Flammarion
ArXiv (abs)PDFHTML

Papers citing "Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks"

21 / 21 papers shown
Title
Optimization Insights into Deep Diagonal Linear Networks
Optimization Insights into Deep Diagonal Linear Networks
Hippolyte Labarrière
C. Molinari
Lorenzo Rosasco
S. Villa
Cristian Vega
173
1
0
21 Dec 2024
Towards understanding how momentum improves generalization in deep
  learning
Towards understanding how momentum improves generalization in deep learning
Samy Jelassi
Yuanzhi Li
ODLMLTAI4CE
72
37
0
13 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On
  Equivalence to Mirror Descent
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
82
29
0
08 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso
  for quadratic parametrisation
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
80
34
0
20 Jun 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
515
6,293
0
05 Apr 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,976
0
29 Mar 2022
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of
  Stochasticity
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
Scott Pesme
Loucas Pillaud-Vivien
Nicolas Flammarion
56
106
0
17 Jun 2021
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
168
95
0
15 Jun 2020
The Two Regimes of Deep Network Training
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
72
45
0
24 Feb 2020
Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth
  Non-Convex Optimization
Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization
Vien V. Mai
M. Johansson
67
56
0
13 Feb 2020
Implicit Regularization for Optimal Sparse Recovery
Implicit Regularization for Optimal Sparse Recovery
Tomas Vaskevicius
Varun Kanade
Patrick Rebeschini
49
103
0
11 Sep 2019
The Role of Memory in Stochastic Optimization
The Role of Memory in Stochastic Optimization
Antonio Orvieto
Jonas Köhler
Aurelien Lucchi
68
30
0
02 Jul 2019
Kernel and Rich Regimes in Overparametrized Models
Blake E. Woodworth
Suriya Gunasekar
Pedro H. P. Savarese
E. Moroshko
Itay Golan
Jason D. Lee
Daniel Soudry
Nathan Srebro
80
366
0
13 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
85
509
0
31 May 2019
Exponentiated Gradient Meets Gradient Descent
Exponentiated Gradient Meets Gradient Descent
Udaya Ghai
Elad Hazan
Y. Singer
70
47
0
05 Feb 2019
Accelerated Linear Convergence of Stochastic Momentum Methods in
  Wasserstein Distances
Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances
Bugra Can
Mert Gurbuzbalaban
Lingjiong Zhu
63
45
0
22 Jan 2019
Understanding the Acceleration Phenomenon via High-Resolution
  Differential Equations
Understanding the Acceleration Phenomenon via High-Resolution Differential Equations
Bin Shi
S. Du
Michael I. Jordan
Weijie J. Su
61
262
0
21 Oct 2018
On the insufficiency of existing momentum schemes for Stochastic
  Optimization
On the insufficiency of existing momentum schemes for Stochastic Optimization
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
ODL
88
119
0
15 Mar 2018
Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient
  Descent
Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
Chi Jin
Praneeth Netrapalli
Michael I. Jordan
ODL
72
263
0
28 Nov 2017
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
345
4,636
0
10 Nov 2016
A Differential Equation for Modeling Nesterov's Accelerated Gradient
  Method: Theory and Insights
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
165
1,173
0
04 Mar 2015
1