Revisiting the Noise Model of Stochastic Gradient Descent

5 March 2023

Papers citing "Revisiting the Noise Model of Stochastic Gradient Descent"

30 / 30 papers shown

Title
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees Aleksandar Armacki Shuhua Yu Pranay Sharma Gauri Joshi Dragana Bajović D. Jakovetić S. Kar 83 2 0 17 Oct 2024
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework Siyuan Yu Wei Chen H. V. Poor 66 0 0 17 Jun 2024
Power-law escape rate of SGD Takashi Mori Liu Ziyin Kangqiao Liu Masakuni Ueda 46 19 0 20 May 2021
Refined Least Squares for Support Recovery Ofir Lindenbaum Stefan Steinerberger 16 6 0 19 Mar 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent Samuel L. Smith Benoit Dherin David Barrett Soham De MLT 34 203 0 28 Jan 2021
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning Pan Zhou Jiashi Feng Chao Ma Caiming Xiong Guosheng Lin E. Weinan 71 234 0 12 Oct 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent Samuel L. Smith Erich Elsen Soham De MLT 49 99 0 26 Jun 2020
Dynamic of Stochastic Gradient Descent with State-Dependent Noise Qi Meng Shiqi Gong Wei Chen Zhi-Ming Ma Tie-Yan Liu 35 16 0 24 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance Jeff Z. HaoChen Colin Wei Jason D. Lee Tengyu Ma 152 95 0 15 Jun 2020
Randomly Aggregated Least Squares for Support Recovery Ofir Lindenbaum Stefan Steinerberger FedML 18 11 0 16 Mar 2020
On the Noisy Gradient Descent that Generalizes as SGD Jingfeng Wu Wenqing Hu Haoyi Xiong Jun Huan Vladimir Braverman Zhanxing Zhu MLT 37 10 0 18 Jun 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima Haowei He Gao Huang Yang Yuan ODL MLT 61 150 0 02 Feb 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks Umut Simsekli Levent Sagun Mert Gurbuzbalaban 82 247 0 18 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.6K 94,729 0 11 Oct 2018
Neural Network Acceptability Judgments Alex Warstadt Amanpreet Singh Samuel R. Bowman 226 1,407 0 31 May 2018
Essentially No Barriers in Neural Network Energy Landscape Felix Dräxler K. Veschgini M. Salmhofer Fred Hamprecht MoMe 105 432 0 02 Mar 2018
An Alternative View: When Does SGD Escape Local Minima? Robert D. Kleinberg Yuanzhi Li Yang Yuan MLT 67 317 0 17 Feb 2018
Visualizing the Loss Landscape of Neural Nets Hao Li Zheng Xu Gavin Taylor Christoph Studer Tom Goldstein 240 1,885 0 28 Dec 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 97 994 0 01 Nov 2017
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks Pratik Chaudhari Stefano Soatto MLT 65 304 0 30 Oct 2017
Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo Umut Simsekli 58 45 0 12 Jun 2017
The loss surface of deep and wide neural networks Quynh N. Nguyen Matthias Hein ODL 148 284 0 26 Apr 2017
Stochastic Gradient Descent as Approximate Bayesian Inference Stephan Mandt Matthew D. Hoffman David M. Blei BDL 52 597 0 13 Apr 2017
A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics Yuchen Zhang Percy Liang Moses Charikar 61 236 0 18 Feb 2017
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis Maxim Raginsky Alexander Rakhlin Matus Telgarsky 70 521 0 13 Feb 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 415 2,935 0 15 Sep 2016
A Variational Analysis of Stochastic Gradient Algorithms Stephan Mandt Matthew D. Hoffman David M. Blei 50 161 0 08 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.1K 193,426 0 10 Dec 2015
Stochastic modified equations and adaptive stochastic gradient algorithms Qianxiao Li Cheng Tai E. Weinan 59 284 0 19 Nov 2015
Practical recommendations for gradient-based training of deep architectures Yoshua Bengio 3DH ODL 185 2,195 0 24 Jun 2012