Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 505 papers shown
Title
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Gabriel S. Gama
Valdir Grassi Jr
MoMe
45
0
0
15 May 2025
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Lianbo Ma
Jianlun Ma
Yuee Zhou
Guoyang Xie
Qiang He
Zhichao Lu
MQ
45
0
0
08 May 2025
Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks
Juyoung Yun
38
0
0
05 May 2025
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Sicong Li
Qianqian Xu
Zhiyong Yang
Zitai Wang
Li Zhang
Xiaochun Cao
Q. Huang
67
0
0
03 May 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
118
4
0
29 Apr 2025
Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks
Konstantinos I Roumeliotis
Ranjan Sapkota
Manoj Karkee
Nikolaos D. Tselikas
Dimitrios K. Nasiopoulos
44
0
0
29 Apr 2025
The effect of the number of parameters and the number of local feature patches on loss landscapes in distributed quantum neural networks
Yoshiaki Kawase
73
0
0
27 Apr 2025
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
63
0
0
25 Apr 2025
Param
Δ
Δ
Δ
for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
80
0
0
23 Apr 2025
Sharpness-Aware Parameter Selection for Machine Unlearning
Saber Malekmohammadi
Hong kyu Lee
Li Xiong
MU
157
0
0
08 Apr 2025
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Sunwoo Lee
109
0
0
18 Mar 2025
High-entropy Advantage in Neural Networks' Generalizability
Entao Yang
Xuzhi Zhang
Yue Shang
Ge Zhang
AI4CE
63
0
0
17 Mar 2025
Analyzing the Role of Permutation Invariance in Linear Mode Connectivity
Keyao Zhan
Puheng Li
Lei Wu
MoMe
82
0
0
13 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
80
0
0
07 Mar 2025
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization
Jake B. Perazzone
Shiqiang Wang
Mingyue Ji
Kevin S. Chan
FedML
75
0
0
01 Mar 2025
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Milad Sefidgaran
A. Zaidi
Piotr Krasnowski
91
1
0
21 Feb 2025
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min-Bin Lin
Ye Wang
DiffM
TDI
166
43
0
21 Feb 2025
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
94
2
0
21 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
76
1
0
18 Feb 2025
Unveiling Mode Connectivity in Graph Neural Networks
Bingheng Li
Z. Chen
Haoyu Han
Shenglai Zeng
J. Liu
Jiliang Tang
48
0
0
18 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
49
4
0
17 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
79
0
0
28 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
116
100
0
28 Jan 2025
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
46
5
0
25 Jan 2025
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff
Freek Holvoet
Katrien Antonio
Roel Henckaerts
101
3
0
20 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
36
1
0
15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
36
3
0
10 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
147
0
0
30 Dec 2024
Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration
D. Wolf
Alexander Braun
Markus Ulrich
89
0
0
18 Dec 2024
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
186
0
0
18 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Zhe Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
197
1
0
16 Dec 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
45
0
0
04 Nov 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
33
3
0
18 Oct 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
62
0
0
14 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
33
6
0
14 Oct 2024
OledFL: Unleashing the Potential of Decentralized Federated Learning via Opposite Lookahead Enhancement
Qinglun Li
Miao Zhang
Mengzhu Wang
Quanjun Yin
Li Shen
OODD
FedML
24
0
0
09 Oct 2024
QT-DoG: Quantization-aware Training for Domain Generalization
Saqib Javed
Hieu Le
Mathieu Salzmann
OOD
MQ
28
1
0
08 Oct 2024
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
61
0
0
08 Oct 2024
Incremental Learning for Robot Shared Autonomy
Yiran Tao
Guixiu Qiao
Dan Ding
Zackory Erickson
CLL
35
0
0
08 Oct 2024
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
Boqian Wu
Q. Xiao
Shunxin Wang
N. Strisciuglio
Mykola Pechenizkiy
M. V. Keulen
D. Mocanu
Elena Mocanu
OOD
3DH
52
0
0
03 Oct 2024
Revisiting Video Quality Assessment from the Perspective of Generalization
Xinli Yue
Jianhui Sun
Liangchao Yao
Fan Xia
Yuetang Deng
...
Lei Li
Fengyun Rao
Jing Lv
Qian Wang
Lingchen Zhao
MoMe
32
0
0
23 Sep 2024
Efficient Training of Deep Neural Operator Networks via Randomized Sampling
Sharmila Karumuri
Lori Graham-Brady
Somdatta Goswami
31
1
0
20 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
58
1
0
26 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
50
3
0
16 Aug 2024
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Eugene Belilovsky
Guy Wolf
FedML
MoMe
29
9
0
07 Jul 2024
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
62
14
0
05 Jul 2024
Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization
Hongjun Choi
Jayaraman J. Thiagarajan
Ruben Glatt
Shusen Liu
43
0
0
29 Jun 2024
Improving robustness to corruptions with multiplicative weight perturbations
Trung Trinh
Markus Heinonen
Luigi Acerbi
Samuel Kaski
44
0
0
24 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
32
1
0
17 Jun 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu
Weijian Deng
Liang Zheng
Tom Gedeon
40
0
0
14 Jun 2024
1
2
3
4
...
9
10
11
Next