Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.14119
Cited By
Stochastic Training is Not Necessary for Generalization
29 September 2021
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stochastic Training is Not Necessary for Generalization"
25 / 25 papers shown
Title
Gradient Descent as a Shrinkage Operator for Spectral Bias
Simon Lucey
38
0
0
25 Apr 2025
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
55
1
0
26 Aug 2024
Just How Flexible are Neural Networks in Practice?
Ravid Shwartz-Ziv
Micah Goldblum
Arpit Bansal
C. B. Bruss
Yann LeCun
Andrew Gordon Wilson
40
4
0
17 Jun 2024
Enhancing Domain Adaptation through Prompt Gradient Alignment
Hoang Phan
Lam C. Tran
Quyen Tran
Trung Le
52
0
0
13 Jun 2024
Neural Redshift: Random Networks are not Random Functions
Damien Teney
A. Nicolicioiu
Valentin Hartmann
Ehsan Abbasnejad
94
18
0
04 Mar 2024
Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training
Yechao Zhang
Shengshan Hu
Leo Yu Zhang
Junyu Shi
Minghui Li
Xiaogeng Liu
Wei Wan
Hai Jin
AAML
22
21
0
15 Jul 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
25
2
0
18 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
11
3
0
08 Jun 2023
Do deep neural networks have an inbuilt Occam's razor?
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
19
15
0
13 Apr 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
15
7
0
19 Feb 2023
U-Clip: On-Average Unbiased Stochastic Gradient Clipping
Bryn Elesedy
Marcus Hutter
11
1
0
06 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
16
6
0
03 Feb 2023
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
27
2
0
29 Nov 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
24
51
0
24 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
21
10
0
19 Nov 2022
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
Jonas Geiping
Micah Goldblum
Gowthami Somepalli
Ravid Shwartz-Ziv
Tom Goldstein
A. Wilson
26
35
0
12 Oct 2022
Deep Double Descent via Smooth Interpolation
Matteo Gamba
Erik Englesson
Marten Bjorkman
Hossein Azizpour
59
10
0
21 Sep 2022
Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD
Konstantinos E. Nikolakakis
Farzin Haddadpour
Amin Karbasi
Dionysios S. Kalogerias
34
17
0
26 Apr 2022
Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)
Jiayuan Ye
Reza Shokri
FedML
22
44
0
10 Mar 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
11
58
0
01 Feb 2022
Subspace Adversarial Training
Tao Li
Yingwen Wu
Sizhe Chen
Kun Fang
Xiaolin Huang
AAML
OOD
36
56
0
24 Nov 2021
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
34
1
0
12 Aug 2021
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
221
1,399
0
04 Dec 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
281
2,888
0
15 Sep 2016
1