ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.12176
  4. Cited By
On the Origin of Implicit Regularization in Stochastic Gradient Descent

On the Origin of Implicit Regularization in Stochastic Gradient Descent

28 January 2021
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
    MLT
ArXivPDFHTML

Papers citing "On the Origin of Implicit Regularization in Stochastic Gradient Descent"

18 / 18 papers shown
Title
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRL
ReLM
LRM
190
23
0
29 Apr 2025
Harnessing uncertainty when learning through Equilibrium Propagation in neural networks
Harnessing uncertainty when learning through Equilibrium Propagation in neural networks
Jonathan Peters
Philippe Talatchian
53
0
0
28 Mar 2025
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi
Antonio Orvieto
Seyed-Mohsen Moosavi-Dezfooli
AI4CE
AAML
410
0
0
15 Oct 2024
Variational Search Distributions
Variational Search Distributions
Daniel M. Steinberg
Rafael Oliveira
Cheng Soon Ong
Edwin V. Bonilla
45
0
0
10 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
102
1
0
26 Aug 2024
Neural Redshift: Random Networks are not Random Functions
Neural Redshift: Random Networks are not Random Functions
Damien Teney
A. Nicolicioiu
Valentin Hartmann
Ehsan Abbasnejad
117
22
0
04 Mar 2024
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning
Jing Zhu
Xiang Song
V. Ioannidis
Danai Koutra
Christos Faloutsos
103
14
0
25 Sep 2023
Implicit Gradient Regularization
Implicit Gradient Regularization
David Barrett
Benoit Dherin
50
149
0
23 Sep 2020
The Break-Even Point on Optimization Trajectories of Deep Neural
  Networks
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
61
157
0
21 Feb 2020
The Impact of Neural Network Overparameterization on Gradient Confusion
  and Stochastic Gradient Descent
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
Karthik A. Sankararaman
Soham De
Zheng Xu
Wenjie Huang
Tom Goldstein
ODL
54
103
0
15 Apr 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural
  Networks
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
75
241
0
18 Jan 2019
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
68
408
0
08 Nov 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
48
289
0
18 Dec 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
67
459
0
13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
93
990
0
01 Nov 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning
  Algorithms
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao
Kashif Rasul
Roland Vollgraf
162
8,807
0
25 Aug 2017
Stochastic Gradient Descent as Approximate Bayesian Inference
Stochastic Gradient Descent as Approximate Bayesian Inference
Stephan Mandt
Matthew D. Hoffman
David M. Blei
BDL
44
594
0
13 Apr 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
355
2,922
0
15 Sep 2016
1