On the Origin of Implicit Regularization in Stochastic Gradient Descent

On the Origin of Implicit Regularization in Stochastic Gradient Descent

28 January 2021

Samuel L. Smith

Papers citing "On the Origin of Implicit Regularization in Stochastic Gradient Descent"

18 / 18 papers shown

Title
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Yiping Wang Qing Yang Zhiyuan Zeng Liliang Ren Liu Liu ... Jianfeng Gao Weizhu Chen Shuaiqiang Wang Simon Shaolei Du Yelong Shen OffRL ReLM LRM 190 23 0 29 Apr 2025
Harnessing uncertainty when learning through Equilibrium Propagation in neural networks Jonathan Peters Philippe Talatchian 53 0 0 28 Mar 2025
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture Sajad Movahedi Antonio Orvieto Seyed-Mohsen Moosavi-Dezfooli AI4CE AAML 410 0 0 15 Oct 2024
Variational Search Distributions Daniel M. Steinberg Rafael Oliveira Cheng Soon Ong Edwin V. Bonilla 45 0 0 10 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer? David Mueller Mark Dredze Nicholas Andrews 102 1 0 26 Aug 2024
Neural Redshift: Random Networks are not Random Functions Damien Teney A. Nicolicioiu Valentin Hartmann Ehsan Abbasnejad 117 22 0 04 Mar 2024
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning Jing Zhu Xiang Song V. Ioannidis Danai Koutra Christos Faloutsos 103 14 0 25 Sep 2023
Implicit Gradient Regularization David Barrett Benoit Dherin 50 149 0 23 Sep 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 61 157 0 21 Feb 2020
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent Karthik A. Sankararaman Soham De Zheng Xu Wenjie Huang Tom Goldstein ODL 54 103 0 15 Apr 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks Umut Simsekli Levent Sagun Mert Gurbuzbalaban 75 241 0 18 Jan 2019
Measuring the Effects of Data Parallelism on Neural Network Training Christopher J. Shallue Jaehoon Lee J. Antognini J. Mamou J. Ketterling Yao Wang 68 408 0 08 Nov 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning Siyuan Ma Raef Bassily M. Belkin 48 289 0 18 Dec 2017
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 67 459 0 13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 93 990 0 01 Nov 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms Han Xiao Kashif Rasul Roland Vollgraf 162 8,807 0 25 Aug 2017
Stochastic Gradient Descent as Approximate Bayesian Inference Stephan Mandt Matthew D. Hoffman David M. Blei BDL 44 594 0 13 Apr 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 355 2,922 0 15 Sep 2016