ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXivPDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 448 papers shown
Title
On the Pareto Front of Multilingual Neural Machine Translation
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
18
5
0
06 Apr 2023
Inductive biases in deep learning models for weather prediction
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
38
5
0
06 Apr 2023
Learning Rate Schedules in the Presence of Distribution Shift
Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach
Adel Javanmard
Vahab Mirrokni
Pratik Worah
19
6
0
27 Mar 2023
Generalization Matters: Loss Minima Flattening via Parameter
  Hybridization for Efficient Online Knowledge Distillation
Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
Tianli Zhang
Mengqi Xue
Jiangtao Zhang
Haofei Zhang
Yu Wang
Lechao Cheng
Jie Song
Mingli Song
28
5
0
26 Mar 2023
Mathematical Challenges in Deep Learning
Mathematical Challenges in Deep Learning
V. Nia
Guojun Zhang
I. Kobyzev
Michael R. Metel
Xinlin Li
...
S. Hemati
M. Asgharian
Linglong Kong
Wulong Liu
Boxing Chen
AI4CE
VLM
35
1
0
24 Mar 2023
Robust Generalization against Photon-Limited Corruptions via Worst-Case
  Sharpness Minimization
Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
Zhuo Huang
Miaoxi Zhu
Xiaobo Xia
Li Shen
Jun Yu
Chen Gong
Bo Han
Bo Du
Tongliang Liu
32
31
0
23 Mar 2023
Decentralized Adversarial Training over Graphs
Decentralized Adversarial Training over Graphs
Ying Cao
Elsa Rizk
Stefan Vlaski
A. H. Sayed
AAML
35
1
0
23 Mar 2023
Improving Transformer Performance for French Clinical Notes
  Classification Using Mixture of Experts on a Limited Dataset
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
67
5
0
22 Mar 2023
Randomized Adversarial Training via Taylor Expansion
Randomized Adversarial Training via Taylor Expansion
Gao Jin
Xinping Yi
Dengyu Wu
Ronghui Mu
Xiaowei Huang
AAML
36
34
0
19 Mar 2023
Informative regularization for a multi-layer perceptron RR Lyrae
  classifier under data shift
Informative regularization for a multi-layer perceptron RR Lyrae classifier under data shift
Francisco Pérez-Galarce
K. Pichara
P. Huijse
M. Catelán
D. Méry
28
0
0
12 Mar 2023
Generalizing and Decoupling Neural Collapse via Hyperspherical
  Uniformity Gap
Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap
Weiyang Liu
L. Yu
Adrian Weller
Bernhard Schölkopf
32
17
0
11 Mar 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
24
9
0
05 Mar 2023
ASP: Learn a Universal Neural Solver!
ASP: Learn a Universal Neural Solver!
Chenguang Wang
Zhouliang Yu
Stephen Marcus McAleer
Tianshu Yu
Yao-Chun Yang
AAML
32
23
0
01 Mar 2023
DART: Diversify-Aggregate-Repeat Training Improves Generalization of
  Neural Networks
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
Samyak Jain
Sravanti Addepalli
P. Sahu
Priyam Dey
R. Venkatesh Babu
MoMe
OOD
35
20
0
28 Feb 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
13
7
0
19 Feb 2023
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio
  for Multi-Task Learning
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning
Caoyun Fan
Wenqing Chen
Jidong Tian
Yitian Li
Hao He
Yaohui Jin
6
2
0
18 Feb 2023
Invertible Neural Skinning
Invertible Neural Skinning
Yash Kant
Aliaksandr Siarohin
R. A. Guler
Menglei Chai
Jian Ren
Sergey Tulyakov
Igor Gilitschenski
3DH
22
2
0
18 Feb 2023
SAM operates far from home: eigenvalue regularization as a dynamical
  phenomenon
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
19
20
0
17 Feb 2023
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
Agustinus Kristiadi
Felix Dangel
Philipp Hennig
22
11
0
14 Feb 2023
Symbolic Discovery of Optimization Algorithms
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
50
350
0
13 Feb 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent
  Directions
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
Vladimir Feinberg
Xinyi Chen
Y. Jennifer Sun
Rohan Anil
Elad Hazan
21
12
0
07 Feb 2023
Generalization Bounds with Data-dependent Fractal Dimensions
Generalization Bounds with Data-dependent Fractal Dimensions
Benjamin Dupuis
George Deligiannidis
Umut cSimcsekli
AI4CE
33
12
0
06 Feb 2023
On a continuous time model of gradient descent dynamics and instability
  in deep learning
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
16
6
0
03 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language
  Models: An Overfitting Perspective
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
21
6
0
03 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
M. Wyart
32
6
0
31 Jan 2023
The Hidden Power of Pure 16-bit Floating-Point Neural Networks
The Hidden Power of Pure 16-bit Floating-Point Neural Networks
Juyoung Yun
Byungkon Kang
Zhoulai Fu
MQ
21
1
0
30 Jan 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware
  Minimization
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Woojin Lee
Jaewook Lee
15
9
0
27 Jan 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
Guihong Li
Yuedong Yang
Kartikeya Bhardwaj
R. Marculescu
31
60
0
26 Jan 2023
On Batching Variable Size Inputs for Training End-to-End Speech
  Enhancement Systems
On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems
Philippe Gonzalez
T. S. Alstrøm
Tobias May
22
9
0
25 Jan 2023
An SDE for Modeling SAM: Theory and Insights
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurélien Lucchi
23
13
0
19 Jan 2023
Catapult Dynamics and Phase Transitions in Quadratic Nets
Catapult Dynamics and Phase Transitions in Quadratic Nets
David Meltzer
Junyu Liu
20
9
0
18 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
28
12
0
16 Jan 2023
Escaping Saddle Points for Effective Generalization on Class-Imbalanced
  Data
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
31
27
0
28 Dec 2022
PiPar: Pipeline Parallelism for Collaborative Machine Learning
PiPar: Pipeline Parallelism for Collaborative Machine Learning
Zihan Zhang
Philip Rodgers
Peter Kilpatrick
I. Spence
Blesson Varghese
FedML
29
3
0
01 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
40
10
0
01 Dec 2022
Adaptive adversarial training method for improving multi-scale GAN based
  on generalization bound theory
Adaptive adversarial training method for improving multi-scale GAN based on generalization bound theory
Jin-Lin Tang
B. Tao
Zeyu Gong
Zhoupin Yin
AI4CE
26
1
0
30 Nov 2022
Boosted Dynamic Neural Networks
Boosted Dynamic Neural Networks
Haichao Yu
Haoxiang Li
G. Hua
Gao Huang
Humphrey Shi
30
7
0
30 Nov 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
27
2
0
29 Nov 2022
A survey of deep learning optimizers -- first and second order methods
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
29
6
0
28 Nov 2022
Exploring Temporal Information Dynamics in Spiking Neural Networks
Exploring Temporal Information Dynamics in Spiking Neural Networks
Youngeun Kim
Yuhang Li
Hyoungseob Park
Yeshwanth Venkatesha
Anna Hambitzer
Priyadarshini Panda
19
32
0
26 Nov 2022
The Vanishing Decision Boundary Complexity and the Strong First
  Component
The Vanishing Decision Boundary Complexity and the Strong First Component
Hengshuai Yao
UQCV
28
0
0
25 Nov 2022
PipeFisher: Efficient Training of Large Language Models Using Pipelining
  and Fisher Information Matrices
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
33
24
0
25 Nov 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain
  Generalization
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
21
51
0
24 Nov 2022
ModelDiff: A Framework for Comparing Learning Algorithms
ModelDiff: A Framework for Comparing Learning Algorithms
Harshay Shah
Sung Min Park
Andrew Ilyas
A. Madry
SyDa
51
26
0
22 Nov 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan
Hanie Sedghi
O. Saukh
R. Entezari
Behnam Neyshabur
MoMe
46
94
0
15 Nov 2022
Towards A Unified Conformer Structure: from ASR to ASV Task
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao
Tao Jiang
Feng Wang
Lin Li
Q. Hong
22
10
0
14 Nov 2022
How Does Sharpness-Aware Minimization Minimize Sharpness?
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
21
47
0
10 Nov 2022
Instance-Dependent Generalization Bounds via Optimal Transport
Instance-Dependent Generalization Bounds via Optimal Transport
Songyan Hou
Parnian Kassraie
Anastasis Kratsios
Andreas Krause
Jonas Rothfuss
20
6
0
02 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo-Lu Zhao
I. Ganev
Robin G. Walters
Rose Yu
Nima Dehmamy
44
16
0
31 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
32
49
0
25 Oct 2022
Previous
123456789
Next