On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 447 papers shown

Title
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning Lianbo Ma Jianlun Ma Yuee Zhou Guoyang Xie Qiang He Zhichao Lu MQ 43 0 0 08 May 2025
Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks Juyoung Yun 38 0 0 05 May 2025
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification Sicong Li Qianqian Xu Zhiyong Yang Zitai Wang L. Zhang Xiaochun Cao Q. Huang 67 0 0 03 May 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Yiping Wang Qing Yang Zhiyuan Zeng Liliang Ren L. Liu ... Jianfeng Gao Weizhu Chen S. Wang Simon S. Du Yelong Shen OffRL ReLM LRM 118 2 0 29 Apr 2025
Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks Konstantinos I Roumeliotis Ranjan Sapkota Manoj Karkee Nikolaos D. Tselikas Dimitrios K. Nasiopoulos 44 0 0 29 Apr 2025
The effect of the number of parameters and the number of local feature patches on loss landscapes in distributed quantum neural networks Yoshiaki Kawase 71 0 0 27 Apr 2025
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training Hiroki Naganuma Xinzhi Zhang Man-Chung Yue Ioannis Mitliagkas Philipp A. Witte Russell J. Hewett Yin Tat Lee 63 0 0 25 Apr 2025
Param $Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost Sheng Cao Mingrui Wu Karthik Prasad Yuandong Tian Zechun Liu MoMe 80 0 0 23 Apr 2025
Sharpness-Aware Parameter Selection for Machine Unlearning Saber Malekmohammadi Hong kyu Lee Li Xiong MU 143 0 0 08 Apr 2025
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning Sunwoo Lee 98 0 0 18 Mar 2025
High-entropy Advantage in Neural Networks' Generalizability Entao Yang X. Zhang Yue Shang Ge Zhang AI4CE 58 0 0 17 Mar 2025
Analyzing the Role of Permutation Invariance in Linear Mode Connectivity Keyao Zhan Puheng Li Lei Wu MoMe 77 0 0 13 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting Linqi Yang Xiongwei Zhao Qihao Sun Ke Wang Ao Chen Peng Kang 3DGS 76 2 0 07 Mar 2025
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization Jake B. Perazzone Shiqiang Wang Mingyue Ji Kevin S. Chan FedML 63 0 0 01 Mar 2025
Reasoning Bias of Next Token Prediction Training Pengxiao Lin Zhongwang Zhang Zhi-Qin John Xu LRM 80 1 0 21 Feb 2025
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors Milad Sefidgaran A. Zaidi Piotr Krasnowski 83 1 0 21 Feb 2025
On Memorization in Diffusion Models Xiangming Gu Chao Du Tianyu Pang Chongxuan Li Min-Bin Lin Ye Wang DiffM TDI 166 43 0 21 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective Pin-Yu Chen 68 1 0 18 Feb 2025
Unveiling Mode Connectivity in Graph Neural Networks Bingheng Li Z. Chen Haoyu Han Shenglai Zeng J. Liu Jiliang Tang 45 0 0 18 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 47 4 0 17 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning Rémy Hosseinkhan Boucher Onofrio Semeraro L. Mathelin 74 0 0 28 Jan 2025
Evolutionary Optimization of Model Merging Recipes Takuya Akiba Makoto Shing Yujin Tang Qi Sun David Ha MoMe 105 99 0 28 Jan 2025
On the use of neural networks for the structural characterization of polymeric porous materials Jorge Torre Suset Barroso-Solares M.A. Rodríguez-Pérez Javier Pinto 36 5 0 25 Jan 2025
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff Freek Holvoet Katrien Antonio Roel Henckaerts 91 3 0 20 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 34 1 0 15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit Oleg Filatov Jan Ebert Jiangtao Wang Stefan Kesselheim 36 3 0 10 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism Tim Tsz-Kit Lau Weijian Li Chenwei Xu Han Liu Mladen Kolar 138 0 0 30 Dec 2024
Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration D. Wolf Alexander Braun Markus Ulrich 89 0 0 18 Dec 2024
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes Aodi Li Liansheng Zhuang Xiao Long Minghong Yao Shafei Wang 180 0 0 18 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization Z. Chen Yiwen Ye Feilong Tang Yongsheng Pan Yong-quan Xia BDL 188 1 0 16 Dec 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks Jim Zhao Sidak Pal Singh Aurélien Lucchi AI4CE 39 0 0 04 Nov 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts R. Teo Tan M. Nguyen MoE 31 3 0 18 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis Weronika Ormaniec Felix Dangel Sidak Pal Singh 33 6 0 14 Oct 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Zhanpeng Zhou Mingze Wang Yuchen Mao Bingrui Li Junchi Yan AAML 59 0 0 14 Oct 2024
OledFL: Unleashing the Potential of Decentralized Federated Learning via Opposite Lookahead Enhancement Qinglun Li Miao Zhang Mengzhu Wang Quanjun Yin Li Shen OODD FedML 19 0 0 09 Oct 2024
QT-DoG: Quantization-aware Training for Domain Generalization Saqib Javed Hieu Le Mathieu Salzmann OOD MQ 28 1 0 08 Oct 2024
Extended convexity and smoothness and their applications in deep learning Binchuan Qi Wei Gong Li Li 61 0 0 08 Oct 2024
Incremental Learning for Robot Shared Autonomy Yiran Tao Guixiu Qiao Dan Ding Zackory Erickson CLL 30 0 0 08 Oct 2024
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness Boqian Wu Q. Xiao Shunxin Wang N. Strisciuglio Mykola Pechenizkiy M. V. Keulen D. Mocanu Elena Mocanu OOD 3DH 52 0 0 03 Oct 2024
Revisiting Video Quality Assessment from the Perspective of Generalization Xinli Yue Jianhui Sun Liangchao Yao Fan Xia Yuetang Deng ... Lei Li Fengyun Rao Jing Lv Qian Wang Lingchen Zhao MoMe 23 0 0 23 Sep 2024
Efficient Training of Deep Neural Operator Networks via Randomized Sampling Sharmila Karumuri Lori Graham-Brady Somdatta Goswami 28 1 0 20 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer? David Mueller Mark Dredze Nicholas Andrews 53 1 0 26 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition Kenzo Clauw S. Stramaglia Daniele Marinazzo 50 3 0 16 Aug 2024
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis Stefan Horoi Albert Manuel Orozco Camacho Eugene Belilovsky Guy Wolf FedML MoMe 27 9 0 07 Jul 2024
Simplifying Deep Temporal Difference Learning Matteo Gallici Mattie Fellows Benjamin Ellis B. Pou Ivan Masmitja Jakob Foerster Mario Martin OffRL 57 14 0 05 Jul 2024
Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization Hongjun Choi Jayaraman J. Thiagarajan Ruben Glatt Shusen Liu 43 0 0 29 Jun 2024
Improving robustness to corruptions with multiplicative weight perturbations Trung Trinh Markus Heinonen Luigi Acerbi Samuel Kaski 38 0 0 24 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD Pierfrancesco Beneventano Andrea Pinto Tomaso A. Poggio MLT 27 1 0 17 Jun 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions? Weijie Tu Weijian Deng Liang Zheng Tom Gedeon 34 0 0 14 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization Jiaxin Deng Junbiao Pang Baochang Zhang 64 1 0 12 Jun 2024