ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.06175
  4. Cited By
An Alternative View: When Does SGD Escape Local Minima?

An Alternative View: When Does SGD Escape Local Minima?

17 February 2018
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
    MLT
ArXivPDFHTML

Papers citing "An Alternative View: When Does SGD Escape Local Minima?"

50 / 69 papers shown
Title
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Gabriel S. Gama
Valdir Grassi Jr
MoMe
52
0
0
15 May 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Dmitry Kovalev
62
1
0
16 Mar 2025
FOCUS: First Order Concentrated Updating Scheme
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
113
1
0
21 Jan 2025
Evolutionary algorithms as an alternative to backpropagation for
  supervised training of Biophysical Neural Networks and Neural ODEs
Evolutionary algorithms as an alternative to backpropagation for supervised training of Biophysical Neural Networks and Neural ODEs
James Hazelden
Yuhan Helena Liu
Eli Shlizerman
E. Shea-Brown
52
2
0
17 Nov 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
32
41
0
12 Jul 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
36
6
0
25 May 2023
On the Pareto Front of Multilingual Neural Machine Translation
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
28
5
0
06 Apr 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
29
9
0
05 Mar 2023
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
Maor Ivgi
Oliver Hinder
Y. Carmon
ODL
40
57
0
08 Feb 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware
  Minimization
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Woojin Lee
Jaewook Lee
25
9
0
27 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
44
12
0
16 Jan 2023
From Gradient Flow on Population Loss to Learning with Stochastic
  Gradient Descent
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Satyen Kale
Jason D. Lee
Chris De Sa
Ayush Sekhari
Karthik Sridharan
36
4
0
13 Oct 2022
OCD: Learning to Overfit with Conditional Diffusion Models
OCD: Learning to Overfit with Conditional Diffusion Models
Shahar Lutati
Lior Wolf
DiffM
35
8
0
02 Oct 2022
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling
  Walks
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks
Yizhou Liu
Weijie J. Su
Tongyang Li
41
18
0
29 Sep 2022
On the generalization of learning algorithms that do not converge
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
26
11
0
16 Aug 2022
A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics
A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics
Lei Li
Yuliang Wang
47
11
0
19 Jul 2022
On uniform-in-time diffusion approximation for stochastic gradient
  descent
On uniform-in-time diffusion approximation for stochastic gradient descent
Lei Li
Yuliang Wang
53
3
0
11 Jul 2022
Revisiting Some Common Practices in Cooperative Multi-Agent
  Reinforcement Learning
Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Wei Fu
Chao Yu
Zelai Xu
Jiaqi Yang
Yi Wu
34
33
0
15 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
52
71
0
14 Jun 2022
Perseus: A Simple and Optimal High-Order Method for Variational
  Inequalities
Perseus: A Simple and Optimal High-Order Method for Variational Inequalities
Tianyi Lin
Michael I. Jordan
35
10
0
06 May 2022
Byzantine Fault Tolerance in Distributed Machine Learning : a Survey
Byzantine Fault Tolerance in Distributed Machine Learning : a Survey
Djamila Bouhata
Hamouma Moumen
Moumen Hamouma
Ahcène Bounceur
AI4CE
31
7
0
05 May 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural
  Network Loss Landscapes
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
35
27
0
24 Apr 2022
Sharper Utility Bounds for Differentially Private Models
Sharper Utility Bounds for Differentially Private Models
Yilin Kang
Yong Liu
Jian Li
Weiping Wang
FedML
35
3
0
22 Apr 2022
Federated Minimax Optimization: Improved Convergence Analyses and
  Algorithms
Federated Minimax Optimization: Improved Convergence Analyses and Algorithms
Pranay Sharma
Rohan Panda
Gauri Joshi
P. Varshney
FedML
21
47
0
09 Mar 2022
Boosting Mask R-CNN Performance for Long, Thin Forensic Traces with
  Pre-Segmentation and IoU Region Merging
Boosting Mask R-CNN Performance for Long, Thin Forensic Traces with Pre-Segmentation and IoU Region Merging
Moritz Zink
M. Schiele
Pengcheng Fan
Stephan Gasterstädt
SSeg
17
0
0
08 Mar 2022
Tackling benign nonconvexity with smoothing and stochastic gradients
Tackling benign nonconvexity with smoothing and stochastic gradients
Harsh Vardhan
Sebastian U. Stich
31
8
0
18 Feb 2022
Exact Solutions of a Deep Linear Network
Exact Solutions of a Deep Linear Network
Liu Ziyin
Botao Li
Xiangmin Meng
ODL
24
21
0
10 Feb 2022
Anticorrelated Noise Injection for Improved Generalization
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
78
44
0
06 Feb 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
28
58
0
01 Feb 2022
On generalization bounds for deep networks based on loss surface
  implicit regularization
On generalization bounds for deep networks based on loss surface implicit regularization
Masaaki Imaizumi
Johannes Schmidt-Hieber
ODL
42
3
0
12 Jan 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
Vitaly Kurin
Alessandro De Palma
Ilya Kostrikov
Shimon Whiteson
M. P. Kumar
41
74
0
11 Jan 2022
Exponential escape efficiency of SGD from sharp minima in non-stationary
  regime
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
34
4
0
07 Nov 2021
Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic
  Gradient Descent
Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent
Sharan Vaswani
Benjamin Dubois-Taine
Reza Babanezhad
56
11
0
21 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
133
98
0
16 Oct 2021
Improved Learning Rates for Stochastic Optimization: Two Theoretical
  Viewpoints
Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints
Shaojie Li
Yong Liu
26
13
0
19 Jul 2021
SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
Satyen Kale
Ayush Sekhari
Karthik Sridharan
196
29
0
11 Jul 2021
Stochastic Polyak Stepsize with a Moving Target
Stochastic Polyak Stepsize with a Moving Target
Robert Mansel Gower
Aaron Defazio
Michael G. Rabbat
32
17
0
22 Jun 2021
Decentralized Local Stochastic Extra-Gradient for Variational
  Inequalities
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities
Aleksandr Beznosikov
Pavel Dvurechensky
Anastasia Koloskova
V. Samokhin
Sebastian U. Stich
Alexander Gasnikov
37
43
0
15 Jun 2021
Problem-solving benefits of down-sampled lexicase selection
Problem-solving benefits of down-sampled lexicase selection
Thomas Helmuth
Lee Spector
33
29
0
10 Jun 2021
An Efficient Algorithm for Deep Stochastic Contextual Bandits
An Efficient Algorithm for Deep Stochastic Contextual Bandits
Tan Zhu
Guannan Liang
Chunjiang Zhu
HaiNing Li
J. Bi
42
1
0
12 Apr 2021
Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space
  Search
Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search
Kartik Hegde
Po-An Tsai
Sitao Huang
Vikas Chandra
A. Parashar
Christopher W. Fletcher
26
93
0
02 Mar 2021
Consistent Sparse Deep Learning: Theory and Computation
Consistent Sparse Deep Learning: Theory and Computation
Y. Sun
Qifan Song
F. Liang
BDL
48
27
0
25 Feb 2021
Local Stochastic Gradient Descent Ascent: Convergence Analysis and
  Communication Efficiency
Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency
Yuyang Deng
M. Mahdavi
35
59
0
25 Feb 2021
Stochastic Gradient Langevin Dynamics with Variance Reduction
Stochastic Gradient Langevin Dynamics with Variance Reduction
Zhishen Huang
Stephen Becker
17
7
0
12 Feb 2021
A spin-glass model for the loss surfaces of generative adversarial
  networks
A spin-glass model for the loss surfaces of generative adversarial networks
Nicholas P. Baskerville
J. Keating
F. Mezzadri
J. Najnudel
GAN
35
12
0
07 Jan 2021
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex
  Optimization
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization
Jun-Kun Wang
Jacob D. Abernethy
24
7
0
04 Oct 2020
Solving Allen-Cahn and Cahn-Hilliard Equations using the Adaptive
  Physics Informed Neural Networks
Solving Allen-Cahn and Cahn-Hilliard Equations using the Adaptive Physics Informed Neural Networks
Colby Wight
Jia Zhao
25
219
0
09 Jul 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and
  Interpolation
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert Mansel Gower
Othmane Sebbouh
Nicolas Loizou
30
74
0
18 Jun 2020
Geometry-Aware Gradient Algorithms for Neural Architecture Search
Geometry-Aware Gradient Algorithms for Neural Architecture Search
Liam Li
M. Khodak
Maria-Florina Balcan
Ameet Talwalkar
31
67
0
16 Apr 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
28
17
0
10 Feb 2020
12
Next