ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.04831
  4. Cited By
The Power of Normalization: Faster Evasion of Saddle Points

The Power of Normalization: Faster Evasion of Saddle Points

15 November 2016
Kfir Y. Levy
ArXivPDFHTML

Papers citing "The Power of Normalization: Faster Evasion of Saddle Points"

50 / 69 papers shown
Title
Smoothed Normalization for Efficient Distributed Private Optimization
Smoothed Normalization for Efficient Distributed Private Optimization
Egor Shulgin
Sarit Khirirat
Peter Richtárik
FedML
87
0
0
20 Feb 2025
From Gradient Clipping to Normalization for Heavy Tailed SGD
From Gradient Clipping to Normalization for Heavy Tailed SGD
Florian Hübler
Ilyas Fatkhullin
Niao He
40
5
0
17 Oct 2024
Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds
Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds
Daniel Dodd
Louis Sharrock
Christopher Nemeth
41
0
0
04 Jun 2024
Series of Hessian-Vector Products for Tractable Saddle-Free Newton
  Optimisation of Neural Networks
Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks
E. T. Oldewage
Ross M. Clarke
José Miguel Hernández-Lobato
ODL
25
1
0
23 Oct 2023
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Lizhang Chen
Bo Liu
Kaizhao Liang
Qian Liu
ODL
27
15
0
09 Oct 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Yan Pan
Yuanzhi Li
33
41
0
31 May 2023
Understanding Predictive Coding as an Adaptive Trust-Region Method
Understanding Predictive Coding as an Adaptive Trust-Region Method
Francesco Innocenti
Ryan Singh
Christopher L. Buckley
23
0
0
29 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of
  Adaptive Methods
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
42
15
0
21 May 2023
Revisiting Gradient Clipping: Stochastic bias and tight convergence
  guarantees
Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees
Anastasia Koloskova
Hadrien Hendrikx
Sebastian U. Stich
112
49
0
02 May 2023
EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections
  for Federated Learning with Heterogeneous Data
EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data
M. Crawshaw
Yajie Bao
Mingrui Liu
FedML
27
8
0
14 Feb 2023
An SDE for Modeling SAM: Theory and Insights
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
25
13
0
19 Jan 2023
Mitigating Memorization of Noisy Labels by Clipping the Model Prediction
Mitigating Memorization of Noisy Labels by Clipping the Model Prediction
Hongxin Wei
Huiping Zhuang
Renchunzi Xie
Lei Feng
Gang Niu
Bo An
Yixuan Li
VLM
NoLa
26
29
0
08 Dec 2022
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast
  Evasion of Non-Degenerate Saddle Points
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle Points
Mayank Baranwal
Param Budhraja
V. Raj
A. Hota
33
2
0
07 Dec 2022
Escaping From Saddle Points Using Asynchronous Coordinate Gradient
  Descent
Escaping From Saddle Points Using Asynchronous Coordinate Gradient Descent
Marco Bornstein
Jin-Peng Liu
Jingling Li
Furong Huang
21
0
0
17 Nov 2022
Dissecting adaptive methods in GANs
Dissecting adaptive methods in GANs
Samy Jelassi
David Dobre
A. Mensch
Yuanzhi Li
Gauthier Gidel
24
4
0
09 Oct 2022
A Communication-Efficient Distributed Gradient Clipping Algorithm for
  Training Deep Neural Networks
A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
Mingrui Liu
Zhenxun Zhuang
Yunwei Lei
Chunyang Liao
38
16
0
10 May 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
29
27
0
02 Feb 2022
On the Second-order Convergence Properties of Random Search Methods
On the Second-order Convergence Properties of Random Search Methods
Aurelien Lucchi
Antonio Orvieto
Adamos Solomou
18
8
0
25 Oct 2021
Robust Distributed Optimization With Randomly Corrupted Gradients
Robust Distributed Optimization With Randomly Corrupted Gradients
Berkay Turan
César A. Uribe
Hoi-To Wai
M. Alizadeh
17
16
0
28 Jun 2021
Backward Gradient Normalization in Deep Neural Networks
Backward Gradient Normalization in Deep Neural Networks
Alejandro Cabana
Luis F. Lago-Fernández
ODL
10
2
0
17 Jun 2021
Escaping Saddle Points Faster with Stochastic Momentum
Escaping Saddle Points Faster with Stochastic Momentum
Jun-Kun Wang
Chi-Heng Lin
Jacob D. Abernethy
ODL
11
22
0
05 Jun 2021
On the Differentially Private Nature of Perturbed Gradient Descent
On the Differentially Private Nature of Perturbed Gradient Descent
Thulasi Tholeti
Sheetal Kalyani
13
1
0
18 Jan 2021
Recent Theoretical Advances in Non-Convex Optimization
Recent Theoretical Advances in Non-Convex Optimization
Marina Danilova
Pavel Dvurechensky
Alexander Gasnikov
Eduard A. Gorbunov
Sergey Guminov
Dmitry Kamzolov
Innokentiy Shibaev
33
77
0
11 Dec 2020
A One-bit, Comparison-Based Gradient Estimator
A One-bit, Comparison-Based Gradient Estimator
HanQin Cai
Daniel McKenzie
W. Yin
Zhenliang Zhang
30
17
0
06 Oct 2020
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Bohang Zhang
Jikai Jin
Cong Fang
Liwei Wang
38
87
0
05 Oct 2020
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex
  Optimization
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization
Jun-Kun Wang
Jacob D. Abernethy
11
7
0
04 Oct 2020
Binary Search and First Order Gradient Based Method for Stochastic
  Optimization
Binary Search and First Order Gradient Based Method for Stochastic Optimization
V. Pandey
ODL
8
0
0
27 Jul 2020
Quantum algorithms for escaping from saddle points
Quantum algorithms for escaping from saddle points
Chenyi Zhang
Jiaqi Leng
Tongyang Li
13
19
0
20 Jul 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
39
275
0
01 Jun 2020
Online non-convex learning for river pollution source identification
Online non-convex learning for river pollution source identification
Wenjie Huang
Jing Jiang
Xiao Liu
12
3
0
22 May 2020
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
  Clipping
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping
Eduard A. Gorbunov
Marina Danilova
Alexander Gasnikov
13
115
0
21 May 2020
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Esteban Real
Chen Liang
David R. So
Quoc V. Le
39
220
0
06 Mar 2020
The Geometry of Sign Gradient Descent
The Geometry of Sign Gradient Descent
Lukas Balles
Fabian Pedregosa
Nicolas Le Roux
ODL
21
23
0
19 Feb 2020
Why are Adaptive Methods Good for Attention Models?
Why are Adaptive Methods Good for Attention Models?
J.N. Zhang
Sai Praneeth Karimireddy
Andreas Veit
Seungyeon Kim
Sashank J. Reddi
Surinder Kumar
S. Sra
12
79
0
06 Dec 2019
Stationary Points of Shallow Neural Networks with Quadratic Activation
  Function
Stationary Points of Shallow Neural Networks with Quadratic Activation Function
D. Gamarnik
Eren C. Kizildag
Ilias Zadik
19
13
0
03 Dec 2019
Shadowing Properties of Optimization Algorithms
Shadowing Properties of Optimization Algorithms
Antonio Orvieto
Aurelien Lucchi
33
18
0
12 Nov 2019
Efficiently avoiding saddle points with zero order methods: No gradients
  required
Efficiently avoiding saddle points with zero order methods: No gradients required
Lampros Flokas
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Georgios Piliouras
22
32
0
29 Oct 2019
Extending the step-size restriction for gradient descent to avoid strict
  saddle points
Extending the step-size restriction for gradient descent to avoid strict saddle points
Hayden Schaeffer
S. McCalla
18
4
0
05 Aug 2019
Efficiently escaping saddle points on manifolds
Efficiently escaping saddle points on manifolds
Christopher Criscitiello
Nicolas Boumal
22
62
0
10 Jun 2019
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity
Why gradient clipping accelerates training: A theoretical justification for adaptivity
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
30
445
0
28 May 2019
On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient
  Langevin Dynamics
On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics
Xi Chen
S. Du
Xin T. Tong
28
33
0
30 Apr 2019
On Nonconvex Optimization for Machine Learning: Gradients,
  Stochasticity, and Saddle Points
On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points
Chi Jin
Praneeth Netrapalli
Rong Ge
Sham Kakade
Michael I. Jordan
24
61
0
13 Feb 2019
Escaping Saddle Points with Adaptive Gradient Methods
Escaping Saddle Points with Adaptive Gradient Methods
Matthew Staib
Sashank J. Reddi
Satyen Kale
Sanjiv Kumar
S. Sra
ODL
14
73
0
26 Jan 2019
A Deterministic Gradient-Based Approach to Avoid Saddle Points
A Deterministic Gradient-Based Approach to Avoid Saddle Points
L. Kreusser
Stanley J. Osher
Bao Wang
ODL
32
3
0
21 Jan 2019
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Yuejie Chi
Yue M. Lu
Yuxin Chen
39
416
0
25 Sep 2018
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path
  Integrated Differential Estimator
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
Cong Fang
C. J. Li
Zhouchen Lin
Tong Zhang
50
570
0
04 Jul 2018
Finding Local Minima via Stochastic Nested Variance Reduction
Finding Local Minima via Stochastic Nested Variance Reduction
Dongruo Zhou
Pan Xu
Quanquan Gu
14
23
0
22 Jun 2018
Stochastic Nested Variance Reduction for Nonconvex Optimization
Stochastic Nested Variance Reduction for Nonconvex Optimization
Dongruo Zhou
Pan Xu
Quanquan Gu
25
146
0
20 Jun 2018
Defending Against Saddle Point Attack in Byzantine-Robust Distributed
  Learning
Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning
Dong Yin
Yudong Chen
Kannan Ramchandran
Peter L. Bartlett
FedML
32
97
0
14 Jun 2018
Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and
  Saddle Point Escape Time
Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time
Hejian Sang
Jia-Wei Liu
ODL
16
1
0
23 May 2018
12
Next