ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.03305
  4. Cited By
Momentum Improves Normalized SGD

Momentum Improves Normalized SGD

9 February 2020
Ashok Cutkosky
Harsh Mehta
    ODL
ArXivPDFHTML

Papers citing "Momentum Improves Normalized SGD"

25 / 25 papers shown
Title
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Dmitry Kovalev
57
0
0
16 Mar 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
47
0
0
10 Feb 2025
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Aleksandar Armacki
Shuhua Yu
Pranay Sharma
Gauri Joshi
Dragana Bajović
D. Jakovetić
S. Kar
57
2
0
17 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
From Gradient Clipping to Normalization for Heavy Tailed SGD
Florian Hübler
Ilyas Fatkhullin
Niao He
40
5
0
17 Oct 2024
Gradient-Free Method for Heavily Constrained Nonconvex Optimization
Gradient-Free Method for Heavily Constrained Nonconvex Optimization
Wanli Shi
Hongchang Gao
Bin Gu
21
5
0
31 Aug 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
H. Cai
Sulaiman A. Alghunaim
Ali H.Sayed
52
1
0
18 Jun 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
73
5
1
25 May 2024
Random Scaling and Momentum for Non-smooth Non-convex Optimization
Random Scaling and Momentum for Non-smooth Non-convex Optimization
Qinzi Zhang
Ashok Cutkosky
43
4
0
16 May 2024
Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs
Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs
Swetha Ganesh
Washim Uddin Mondal
Vaneet Aggarwal
49
3
0
02 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
42
4
0
01 Apr 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal
  Linear Networks
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
Hristo Papazov
Scott Pesme
Nicolas Flammarion
38
5
0
08 Mar 2024
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Yuan Gao
Anton Rodomanov
Sebastian U. Stich
39
6
0
05 Mar 2024
Bilevel Optimization under Unbounded Smoothness: A New Algorithm and
  Convergence Analysis
Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis
Jie Hao
Xiaochuan Gong
Mingrui Liu
33
7
0
17 Jan 2024
SING: A Plug-and-Play DNN Learning Technique
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
36
0
0
25 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
27
0
0
23 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of
  Adaptive Methods
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
42
15
0
21 May 2023
A New Linear Scaling Rule for Private Adaptive Hyperparameter
  Optimization
A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
Ashwinee Panda
Xinyu Tang
Saeed Mahloujifar
Vikash Sehwag
Prateek Mittal
43
11
0
08 Dec 2022
Momentum Aggregation for Private Non-convex ERM
Momentum Aggregation for Private Non-convex ERM
Hoang Tran
Ashok Cutkosky
26
14
0
12 Oct 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and
  Stronger
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Zhiqi Bu
Yu-Xiang Wang
Sheng Zha
George Karypis
27
69
0
14 Jun 2022
A Novel Convergence Analysis for Algorithms of the Adam Family
A Novel Convergence Analysis for Algorithms of the Adam Family
Zhishuai Guo
Yi Tian Xu
W. Yin
Rong Jin
Tianbao Yang
39
47
0
07 Dec 2021
AGGLIO: Global Optimization for Locally Convex Functions
AGGLIO: Global Optimization for Locally Convex Functions
Debojyoti Dey
B. Mukhoty
Purushottam Kar
14
2
0
06 Nov 2021
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Jikai Jin
Samir Bhatt
Haiyang Wang
Liwei Wang
32
47
0
24 Oct 2021
ErrorCompensatedX: error compensation for variance reduced algorithms
ErrorCompensatedX: error compensation for variance reduced algorithms
Hanlin Tang
Yao Li
Ji Liu
Ming Yan
32
9
0
04 Aug 2021
Stability and Convergence of Stochastic Gradient Clipping: Beyond
  Lipschitz Continuity and Smoothness
Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness
Vien V. Mai
M. Johansson
31
38
0
12 Feb 2021
Second-Order Information in Non-Convex Stochastic Optimization: Power
  and Limitations
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations
Yossi Arjevani
Y. Carmon
John C. Duchi
Dylan J. Foster
Ayush Sekhari
Karthik Sridharan
87
53
0
24 Jun 2020
1