Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.02749
Cited By
Revisiting the Noise Model of Stochastic Gradient Descent
5 March 2023
Barak Battash
Ofir Lindenbaum
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revisiting the Noise Model of Stochastic Gradient Descent"
30 / 30 papers shown
Title
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Aleksandar Armacki
Shuhua Yu
Pranay Sharma
Gauri Joshi
Dragana Bajović
D. Jakovetić
S. Kar
83
2
0
17 Oct 2024
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework
Siyuan Yu
Wei Chen
H. V. Poor
66
0
0
17 Jun 2024
Power-law escape rate of SGD
Takashi Mori
Liu Ziyin
Kangqiao Liu
Masakuni Ueda
46
19
0
20 May 2021
Refined Least Squares for Support Recovery
Ofir Lindenbaum
Stefan Steinerberger
16
6
0
19 Mar 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
34
203
0
28 Jan 2021
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
71
234
0
12 Oct 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
49
99
0
26 Jun 2020
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
Qi Meng
Shiqi Gong
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
35
16
0
24 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
152
95
0
15 Jun 2020
Randomly Aggregated Least Squares for Support Recovery
Ofir Lindenbaum
Stefan Steinerberger
FedML
18
11
0
16 Mar 2020
On the Noisy Gradient Descent that Generalizes as SGD
Jingfeng Wu
Wenqing Hu
Haoyi Xiong
Jun Huan
Vladimir Braverman
Zhanxing Zhu
MLT
37
10
0
18 Jun 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Haowei He
Gao Huang
Yang Yuan
ODL
MLT
61
150
0
02 Feb 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
82
247
0
18 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.6K
94,729
0
11 Oct 2018
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
226
1,407
0
31 May 2018
Essentially No Barriers in Neural Network Energy Landscape
Felix Dräxler
K. Veschgini
M. Salmhofer
Fred Hamprecht
MoMe
105
432
0
02 Mar 2018
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
67
317
0
17 Feb 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
240
1,885
0
28 Dec 2017
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
97
994
0
01 Nov 2017
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
65
304
0
30 Oct 2017
Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo
Umut Simsekli
58
45
0
12 Jun 2017
The loss surface of deep and wide neural networks
Quynh N. Nguyen
Matthias Hein
ODL
148
284
0
26 Apr 2017
Stochastic Gradient Descent as Approximate Bayesian Inference
Stephan Mandt
Matthew D. Hoffman
David M. Blei
BDL
52
597
0
13 Apr 2017
A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics
Yuchen Zhang
Percy Liang
Moses Charikar
61
236
0
18 Feb 2017
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
Maxim Raginsky
Alexander Rakhlin
Matus Telgarsky
70
521
0
13 Feb 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
415
2,935
0
15 Sep 2016
A Variational Analysis of Stochastic Gradient Algorithms
Stephan Mandt
Matthew D. Hoffman
David M. Blei
50
161
0
08 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.1K
193,426
0
10 Dec 2015
Stochastic modified equations and adaptive stochastic gradient algorithms
Qianxiao Li
Cheng Tai
E. Weinan
59
284
0
19 Nov 2015
Practical recommendations for gradient-based training of deep architectures
Yoshua Bengio
3DH
ODL
185
2,195
0
24 Jun 2012
1