ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05685
  4. Cited By
Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum
  under Heavy-Tailed Gradient Noise

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

13 February 2020
Umut Simsekli
Lingjiong Zhu
Yee Whye Teh
Mert Gurbuzbalaban
ArXivPDFHTML

Papers citing "Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise"

26 / 26 papers shown
Title
The Heavy-Tail Phenomenon in SGD
The Heavy-Tail Phenomenon in SGD
Mert Gurbuzbalaban
Umut Simsekli
Lingjiong Zhu
45
126
0
08 Jun 2020
Why are Adaptive Methods Good for Attention Models?
Why are Adaptive Methods Good for Attention Models?
J.N. Zhang
Sai Praneeth Karimireddy
Andreas Veit
Seungyeon Kim
Sashank J. Reddi
Surinder Kumar
S. Sra
90
80
0
06 Dec 2019
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep
  Neural Networks
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut Simsekli
Mert Gurbuzbalaban
T. H. Nguyen
G. Richard
Levent Sagun
61
58
0
29 Nov 2019
Non-Gaussianity of Stochastic Gradient Noise
Non-Gaussianity of Stochastic Gradient Noise
A. Panigrahi
Raghav Somani
Navin Goyal
Praneeth Netrapalli
58
52
0
21 Oct 2019
First Exit Time Analysis of Stochastic Gradient Descent Under
  Heavy-Tailed Gradient Noise
First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
T. H. Nguyen
Umut Simsekli
Mert Gurbuzbalaban
G. Richard
47
64
0
21 Jun 2019
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity
Why gradient clipping accelerates training: A theoretical justification for adaptivity
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
72
459
0
28 May 2019
Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very
  Large Pre-Trained Deep Neural Networks
Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks
Charles H. Martin
Michael W. Mahoney
42
56
0
24 Jan 2019
Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for
  Non-Convex Optimization
Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization
T. H. Nguyen
Umut Simsekli
G. Richard
47
28
0
22 Jan 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural
  Networks
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
82
247
0
18 Jan 2019
Breaking Reversibility Accelerates Langevin Dynamics for Global
  Non-Convex Optimization
Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization
Xuefeng Gao
Mert Gurbuzbalaban
Lingjiong Zhu
51
31
0
19 Dec 2018
Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for
  Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and
  Momentum-Based Acceleration
Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration
Xuefeng Gao
Mert Gurbuzbalaban
Lingjiong Zhu
49
60
0
12 Sep 2018
On sampling from a log-concave density using kinetic Langevin diffusions
On sampling from a log-concave density using kinetic Langevin diffusions
A. Dalalyan
L. Riou-Durand
63
155
0
24 Jul 2018
Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization
Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization
Umut Simsekli
Çağatay Yıldız
T. H. Nguyen
G. Richard
A. Cemgil
36
22
0
07 Jun 2018
Stochastic Variance-Reduced Hamilton Monte Carlo Methods
Stochastic Variance-Reduced Hamilton Monte Carlo Methods
Difan Zou
Pan Xu
Quanquan Gu
BDL
48
31
0
13 Feb 2018
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
76
463
0
13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
97
994
0
01 Nov 2017
Stochastic gradient descent performs variational inference, converges to
  limit cycles for deep networks
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
65
304
0
30 Oct 2017
Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex
  Optimization
Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization
Pan Xu
Jinghui Chen
Difan Zou
Quanquan Gu
68
205
0
20 Jul 2017
Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic
  Differential Equations for Markov Chain Monte Carlo
Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo
Umut Simsekli
58
45
0
12 Jun 2017
Kinetic energy choice in Hamiltonian/hybrid Monte Carlo
Kinetic energy choice in Hamiltonian/hybrid Monte Carlo
Samuel Livingstone
Michael F Faulkner
Gareth O. Roberts
45
45
0
08 Jun 2017
Non-convex learning via Stochastic Gradient Langevin Dynamics: a
  nonasymptotic analysis
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
Maxim Raginsky
Alexander Rakhlin
Matus Telgarsky
70
521
0
13 Feb 2017
Relativistic Monte Carlo
Relativistic Monte Carlo
Xiaoyu Lu
Valerio Perrone
Leonard Hasenclever
Yee Whye Teh
Sebastian J. Vollmer
BDL
44
39
0
14 Sep 2016
Accurate and efficient numerical calculation of stable densities via
  optimized quadrature and asymptotics
Accurate and efficient numerical calculation of stable densities via optimized quadrature and asymptotics
Sebastian Ament
M. O’Neil
76
22
0
14 Jul 2016
A Variational Analysis of Stochastic Gradient Algorithms
A Variational Analysis of Stochastic Gradient Algorithms
Stephan Mandt
Matthew D. Hoffman
David M. Blei
47
161
0
08 Feb 2016
On the difficulty of training Recurrent Neural Networks
On the difficulty of training Recurrent Neural Networks
Razvan Pascanu
Tomas Mikolov
Yoshua Bengio
ODL
182
5,334
0
21 Nov 2012
MCMC using Hamiltonian dynamics
MCMC using Hamiltonian dynamics
Radford M. Neal
290
3,276
0
09 Jun 2012
1