Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.06053
Cited By
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
18 January 2019
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks"
50 / 56 papers shown
Title
Understanding the Generalization Error of Markov algorithms through Poissonization
Benjamin Dupuis
Maxime Haddouche
George Deligiannidis
Umut Simsekli
47
0
0
11 Feb 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
44
1
0
12 Jan 2025
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Zhijie Chen
Qiaobo Li
A. Banerjee
FedML
35
0
0
11 Nov 2024
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Aleksandar Armacki
Shuhua Yu
Pranay Sharma
Gauri Joshi
Dragana Bajović
D. Jakovetić
S. Kar
57
2
0
17 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
Florian Hübler
Ilyas Fatkhullin
Niao He
40
5
0
17 Oct 2024
Deep Kernel Posterior Learning under Infinite Variance Prior Weights
Jorge Loría
A. Bhadra
BDL
UQCV
61
0
0
02 Oct 2024
Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates
Puning Zhao
Jiafei Wu
Zhe Liu
Chong Wang
Rongfei Fan
Qingming Li
48
1
0
19 Aug 2024
q-exponential family for policy optimization
Lingwei Zhu
Haseeb Shah
Han Wang
Yukie Nagai
Martha White
OffRL
78
0
0
14 Aug 2024
Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets
Benjamin Dupuis
Paul Viallard
George Deligiannidis
Umut Simsekli
48
2
0
26 Apr 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
39
1
0
12 Apr 2024
Investigation into the Training Dynamics of Learned Optimizers
Jan Sobotka
Petr Simánek
Daniel Vasata
28
0
0
12 Dec 2023
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
Haichao Sha
Ruixuan Liu
Yi-xiao Liu
Hong Chen
52
1
0
06 Dec 2023
From Mutual Information to Expected Dynamics: New Generalization Bounds for Heavy-Tailed SGD
Benjamin Dupuis
Paul Viallard
18
3
0
01 Dec 2023
A Heavy-Tailed Algebra for Probabilistic Programming
Feynman T. Liang
Liam Hodgkinson
Michael W. Mahoney
18
3
0
15 Jun 2023
Learning Trajectories are Generalization Indicators
Jingwen Fu
Zhizheng Zhang
Dacheng Yin
Yan Lu
Nanning Zheng
AI4CE
33
3
0
25 Apr 2023
Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models
Anant Raj
Umut Simsekli
Alessandro Rudi
DiffM
31
1
0
30 Mar 2023
Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation
Zijian Liu
Zhengyuan Zhou
27
10
0
22 Mar 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
27
9
0
05 Mar 2023
Breaking the Lower Bound with (Little) Structure: Acceleration in Non-Convex Stochastic Optimization with Heavy-Tailed Noise
Zijian Liu
Jiawei Zhang
Zhengyuan Zhou
32
12
0
14 Feb 2023
U-Clip: On-Average Unbiased Stochastic Gradient Clipping
Bryn Elesedy
Marcus Hutter
19
1
0
06 Feb 2023
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
23
13
0
19 Jan 2023
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
31
6
0
05 Dec 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
27
2
0
29 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
30
10
0
19 Nov 2022
Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance) Noise in Federated Learning
Haibo Yang
Pei-Yuan Qiu
Jia Liu
FedML
27
12
0
03 Oct 2022
How Much Privacy Does Federated Learning with Secure Aggregation Guarantee?
A. Elkordy
Jiang Zhang
Yahya H. Ezzeldin
Konstantinos Psounis
A. Avestimehr
FedML
35
38
0
03 Aug 2022
On uniform-in-time diffusion approximation for stochastic gradient descent
Lei Li
Yuliang Wang
48
3
0
11 Jul 2022
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion
Chengli Tan
Jiang Zhang
Junmin Liu
35
1
0
09 Jun 2022
Encoded Gradients Aggregation against Gradient Leakage in Federated Learning
Dun Zeng
Shiyu Liu
Siqi Liang
Zonghang Li
Hongya Wang
Irwin King
Zenglin Xu
FedML
16
0
0
26 May 2022
Heavy-Tail Phenomenon in Decentralized SGD
Mert Gurbuzbalaban
Yuanhan Hu
Umut Simsekli
Kun Yuan
Lingjiong Zhu
38
8
0
13 May 2022
An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate
Sayar Karmakar
Anirbit Mukherjee
21
0
0
26 Apr 2022
Flexible risk design using bi-directional dispersion
Matthew J. Holland
35
5
0
28 Mar 2022
Extended critical regimes of deep neural networks
Chengqing Qu
Asem Wardak
P. Gong
AI4CE
24
1
0
24 Mar 2022
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
55
44
0
06 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
9
0
31 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
25
34
0
20 Jan 2022
Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks
Tolga Birdal
Aaron Lou
Leonidas J. Guibas
Umut cSimcsekli
30
61
0
25 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
28
4
0
07 Nov 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
31
15
0
19 Jul 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
Amrit Singh Bedi
Anjaly Parayil
Junyu Zhang
Mengdi Wang
Alec Koppel
30
15
0
15 Jun 2021
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms
A. Camuto
George Deligiannidis
Murat A. Erdogdu
Mert Gurbuzbalaban
Umut cSimcsekli
Lingjiong Zhu
33
29
0
09 Jun 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
21
29
0
31 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Hongjian Wang
Mert Gurbuzbalaban
Lingjiong Zhu
Umut cSimcsekli
Murat A. Erdogdu
21
41
0
20 Feb 2021
Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes
Steffen Dereich
Sebastian Kassing
34
27
0
16 Feb 2021
Advances in Electron Microscopy with Deep Learning
Jeffrey M. Ede
32
2
0
04 Jan 2021
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
61
37
0
07 Dec 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
34
79
0
17 Sep 2020
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
Umut Simsekli
Ozan Sener
George Deligiannidis
Murat A. Erdogdu
44
55
0
16 Jun 2020
1
2
Next