ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07562
  4. Cited By
On the diffusion approximation of nonconvex stochastic gradient descent

On the diffusion approximation of nonconvex stochastic gradient descent

22 May 2017
Junyang Qian
C. J. Li
Lei Li
Jianguo Liu
    DiffM
ArXivPDFHTML

Papers citing "On the diffusion approximation of nonconvex stochastic gradient descent"

9 / 9 papers shown
Title
A General Continuous-Time Formulation of Stochastic ADMM and Its
  Variants
A General Continuous-Time Formulation of Stochastic ADMM and Its Variants
Chris Junchi Li
37
0
0
22 Apr 2024
Uniform Generalization Bound on Time and Inverse Temperature for
  Gradient Descent Algorithm and its Application to Analysis of Simulated
  Annealing
Uniform Generalization Bound on Time and Inverse Temperature for Gradient Descent Algorithm and its Application to Analysis of Simulated Annealing
Keisuke Suzuki
AI4CE
33
0
0
25 May 2022
Weak Convergence of Approximate reflection coupling and its Application
  to Non-convex Optimization
Weak Convergence of Approximate reflection coupling and its Application to Non-convex Optimization
Keisuke Suzuki
36
5
0
24 May 2022
Fluctuation-dissipation relations for stochastic gradient descent
Fluctuation-dissipation relations for stochastic gradient descent
Sho Yaida
32
73
0
28 Sep 2018
A Walk with SGD
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
27
118
0
24 Feb 2018
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
42
457
0
13 Nov 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,892
0
15 Sep 2016
A Differential Equation for Modeling Nesterov's Accelerated Gradient
  Method: Theory and Insights
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
108
1,157
0
04 Mar 2015
The Loss Surfaces of Multilayer Networks
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
183
1,186
0
30 Nov 2014
1