Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.11162
Cited By
Implicit Gradient Regularization
23 September 2020
David Barrett
Benoit Dherin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Implicit Gradient Regularization"
35 / 35 papers shown
Title
Stochastic Rounding for LLM Training: Theory and Practice
Kaan Ozkara
Tao Yu
Youngsuk Park
43
0
0
27 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
83
4
0
10 Feb 2025
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi
Antonio Orvieto
Seyed-Mohsen Moosavi-Dezfooli
AI4CE
AAML
172
0
0
15 Oct 2024
Rethinking Meta-Learning from a Learning Lens
Wenwen Qiang
Jingyao Wang
Chuxiong Sun
Hui Xiong
Jiangmeng Li
51
1
0
13 Sep 2024
Variational Search Distributions
Daniel M. Steinberg
Rafael Oliveira
Cheng Soon Ong
Edwin V. Bonilla
33
0
0
10 Sep 2024
Input Space Mode Connectivity in Deep Neural Networks
Jakub Vrabel
Ori Shem-Ur
Yaron Oz
David Krueger
56
1
0
09 Sep 2024
Normalization and effective learning rates in reinforcement learning
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
James Martens
H. V. Hasselt
Razvan Pascanu
Will Dabney
19
7
0
01 Jul 2024
A Margin-based Multiclass Generalization Bound via Geometric Complexity
Michael Munn
Benoit Dherin
Javier Gonzalvo
UQCV
40
2
0
28 May 2024
Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training
Yechao Zhang
Shengshan Hu
Leo Yu Zhang
Junyu Shi
Minghui Li
Xiaogeng Liu
Wei Wan
Hai Jin
AAML
22
21
0
15 Jul 2023
Learning Trajectories are Generalization Indicators
Jingwen Fu
Zhizheng Zhang
Dacheng Yin
Yan Lu
Nanning Zheng
AI4CE
33
3
0
25 Apr 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
20
7
0
03 Feb 2023
Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent
Avrajit Ghosh
He Lyu
Xitong Zhang
Rongrong Wang
53
21
0
02 Feb 2023
A Stability Analysis of Fine-Tuning a Pre-Trained Model
Z. Fu
Anthony Man-Cho So
Nigel Collier
23
3
0
24 Jan 2023
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
27
2
0
29 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
30
10
0
19 Nov 2022
Why Deep Learning Generalizes
Benjamin L. Badger
TDI
AI4CE
20
3
0
17 Nov 2022
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
50
9
0
28 Oct 2022
Rethinking Sharpness-Aware Minimization as Variational Inference
Szilvia Ujváry
Zsigmond Telek
A. Kerekes
Anna Mészáros
Ferenc Huszár
33
8
0
19 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
76
34
0
04 Oct 2022
Why neural networks find simple solutions: the many regularizers of geometric complexity
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
57
31
0
27 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
34
72
0
26 Aug 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
45
70
0
14 Jun 2022
On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity
Vincent Szolnoky
Viktor Andersson
Balázs Kulcsár
Rebecka Jörnsten
45
5
0
25 May 2022
Lassoed Tree Boosting
Alejandro Schuler
Yi Li
Mark van der Laan
30
3
0
22 May 2022
Variational Autoencoders Without the Variation
Gregory A. Daly
J. Fieldsend
G. Tabor
31
2
0
01 Mar 2022
GOSH: Task Scheduling Using Deep Surrogate Models in Fog Computing Environments
Shreshth Tuli
G. Casale
N. Jennings
32
21
0
16 Dec 2021
The Geometric Occam's Razor Implicit in Deep Learning
Benoit Dherin
Micheal Munn
David Barrett
22
6
0
30 Nov 2021
Subspace Adversarial Training
Tao Li
Yingwen Wu
Sizhe Chen
Kun Fang
Xiaolin Huang
AAML
OOD
44
56
0
24 Nov 2021
Understanding Dimensional Collapse in Contrastive Self-supervised Learning
Li Jing
Pascal Vincent
Yann LeCun
Yuandong Tian
SSL
25
338
0
18 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
133
98
0
16 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
31
15
0
19 Jul 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
26
31
0
25 Jun 2021
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,890
0
15 Sep 2016
1