Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.09000
Cited By
When and Why Momentum Accelerates SGD:An Empirical Study
15 June 2023
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"When and Why Momentum Accelerates SGD:An Empirical Study"
9 / 9 papers shown
Title
Computational Analysis of Yaredawi YeZema Silt in Ethiopian Orthodox Tewahedo Church Chants
Mequanent Argaw Muluneh
Yan-Tsung Peng
Li Su
49
0
0
25 Dec 2024
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
22
3
0
31 Oct 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
73
5
1
25 May 2024
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
Prin Phunyaphibarn
Junghyun Lee
Bohan Wang
Huishuai Zhang
Chulhee Yun
29
0
0
25 Nov 2023
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
50
8
0
27 Jul 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson
Bettina Messmer
Martin Jaggi
35
12
0
26 May 2023
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Frederik Kunstner
Jacques Chen
J. Lavington
Mark W. Schmidt
40
67
0
27 Apr 2023
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
66
35
0
07 Oct 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning
Sanjeev Arora
Zhiyuan Li
A. Panigrahi
MLT
83
91
0
19 May 2022
1