Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.03305
Cited By
Momentum Improves Normalized SGD
9 February 2020
Ashok Cutkosky
Harsh Mehta
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Momentum Improves Normalized SGD"
25 / 25 papers shown
Title
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Dmitry Kovalev
57
0
0
16 Mar 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
47
0
0
10 Feb 2025
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Aleksandar Armacki
Shuhua Yu
Pranay Sharma
Gauri Joshi
Dragana Bajović
D. Jakovetić
S. Kar
57
2
0
17 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
Florian Hübler
Ilyas Fatkhullin
Niao He
40
5
0
17 Oct 2024
Gradient-Free Method for Heavily Constrained Nonconvex Optimization
Wanli Shi
Hongchang Gao
Bin Gu
21
5
0
31 Aug 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
H. Cai
Sulaiman A. Alghunaim
Ali H.Sayed
48
1
0
18 Jun 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
71
5
1
25 May 2024
Random Scaling and Momentum for Non-smooth Non-convex Optimization
Qinzi Zhang
Ashok Cutkosky
43
4
0
16 May 2024
Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs
Swetha Ganesh
Washim Uddin Mondal
Vaneet Aggarwal
49
3
0
02 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
42
4
0
01 Apr 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
Hristo Papazov
Scott Pesme
Nicolas Flammarion
38
5
0
08 Mar 2024
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Yuan Gao
Anton Rodomanov
Sebastian U. Stich
37
6
0
05 Mar 2024
Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis
Jie Hao
Xiaochuan Gong
Mingrui Liu
33
7
0
17 Jan 2024
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
36
0
0
25 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
27
0
0
23 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
39
15
0
21 May 2023
A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
Ashwinee Panda
Xinyu Tang
Saeed Mahloujifar
Vikash Sehwag
Prateek Mittal
43
11
0
08 Dec 2022
Momentum Aggregation for Private Non-convex ERM
Hoang Tran
Ashok Cutkosky
26
14
0
12 Oct 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Zhiqi Bu
Yu-Xiang Wang
Sheng Zha
George Karypis
27
69
0
14 Jun 2022
A Novel Convergence Analysis for Algorithms of the Adam Family
Zhishuai Guo
Yi Tian Xu
W. Yin
R. L. Jin
Tianbao Yang
39
47
0
07 Dec 2021
AGGLIO: Global Optimization for Locally Convex Functions
Debojyoti Dey
B. Mukhoty
Purushottam Kar
14
2
0
06 Nov 2021
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Jikai Jin
Samir Bhatt
Haiyang Wang
Liwei Wang
32
47
0
24 Oct 2021
ErrorCompensatedX: error compensation for variance reduced algorithms
Hanlin Tang
Yao Li
Ji Liu
Ming Yan
32
9
0
04 Aug 2021
Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness
Vien V. Mai
M. Johansson
31
38
0
12 Feb 2021
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations
Yossi Arjevani
Y. Carmon
John C. Duchi
Dylan J. Foster
Ayush Sekhari
Karthik Sridharan
87
53
0
24 Jun 2020
1