ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXivPDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 513 papers shown
Title
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
66
1
0
12 Jun 2024
Agnostic Sharpness-Aware Minimization
Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen
Quyen Tran
Tuan Truong
Thanh-Toan Do
Dinh Q. Phung
Trung Le
46
0
0
11 Jun 2024
The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective
The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective
Nils Philipp Walter
Linara Adilova
Jilles Vreeken
Michael Kamp
AAML
48
2
0
27 May 2024
MCGAN: Enhancing GAN Training with Regression-Based Generator Loss
MCGAN: Enhancing GAN Training with Regression-Based Generator Loss
Baoren Xiao
Hao Ni
Weixin Yang
GAN
49
0
0
27 May 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
71
4
1
25 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
Sakshi Choudhary
Sai Aparna Aketi
Kaushik Roy
FedML
45
0
0
22 May 2024
Why is SAM Robust to Label Noise?
Why is SAM Robust to Label Noise?
Christina Baek
Zico Kolter
Aditi Raghunathan
NoLa
AAML
43
9
0
06 May 2024
A separability-based approach to quantifying generalization: which layer
  is best?
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
37
3
0
02 May 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
84
1
0
30 Apr 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
29
0
0
24 Apr 2024
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
Yang Ye
Shihao Ji
PINN
3DPC
41
0
0
19 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
Flatness Improves Backbone Generalisation in Few-shot Classification
Rui Li
Martin Trapp
Marcus Klasson
Arno Solin
45
0
0
11 Apr 2024
Statistical Mechanics and Artificial Neural Networks: Principles,
  Models, and Applications
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
32
0
0
05 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He
Christina Lee Yu
35
4
0
04 Apr 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on
  Training Sets
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
S. Mougiakakou
39
0
0
08 Mar 2024
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Non-Convex Stochastic Composite Optimization with Polyak Momentum
Yuan Gao
Anton Rodomanov
Sebastian U. Stich
34
6
0
05 Mar 2024
Level Set Teleportation: An Optimization Perspective
Level Set Teleportation: An Optimization Perspective
Aaron Mishkin
A. Bietti
Robert Mansel Gower
36
1
0
05 Mar 2024
Merging Text Transformer Models from Different Initializations
Merging Text Transformer Models from Different Initializations
Neha Verma
Maha Elbayad
MoMe
59
7
0
01 Mar 2024
Fine-tuning with Very Large Dropout
Fine-tuning with Very Large Dropout
Jianyu Zhang
Léon Bottou
44
1
0
01 Mar 2024
Effective Gradient Sample Size via Variation Estimation for Accelerating
  Sharpness aware Minimization
Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Tian Wang
45
1
0
24 Feb 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
43
2
0
07 Feb 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
79
5
0
22 Jan 2024
EsaCL: Efficient Continual Learning of Sparse Models
EsaCL: Efficient Continual Learning of Sparse Models
Weijieying Ren
V. Honavar
CLL
22
3
0
11 Jan 2024
Preserving Silent Features for Domain Generalization
Preserving Silent Features for Domain Generalization
Chujie Zhao
Tianren Zhang
Feng Chen
23
0
0
06 Jan 2024
Doubly Perturbed Task Free Continual Learning
Doubly Perturbed Task Free Continual Learning
Byung Hyun Lee
Min-hwan Oh
Se Young Chun
24
3
0
20 Dec 2023
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
Weixi Song
Z. Li
Lefei Zhang
Hai Zhao
Bo Du
VLM
23
7
0
19 Dec 2023
Directions of Curvature as an Explanation for Loss of Plasticity
Directions of Curvature as an Explanation for Loss of Plasticity
Alex Lewandowski
Haruto Tanaka
Dale Schuurmans
Marlos C. Machado
13
5
0
30 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
44
1
0
29 Nov 2023
Digital Twin-Enhanced Deep Reinforcement Learning for Resource
  Management in Networks Slicing
Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing
Zhengming Zhang
Yongming Huang
Cheng Zhang
Qingbi Zheng
Luxi Yang
Xiaohu You
24
12
0
28 Nov 2023
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for
  Enhanced Dataset Pruning
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Xin Zhang
Jiawei Du
Yunsong Li
Weiying Xie
Qiufeng Wang
37
7
0
22 Nov 2023
Generalization Bounds for Label Noise Stochastic Gradient Descent
Generalization Bounds for Label Noise Stochastic Gradient Descent
Jung Eun Huh
Patrick Rebeschini
13
1
0
01 Nov 2023
Gradient constrained sharpness-aware prompt learning for vision-language
  models
Gradient constrained sharpness-aware prompt learning for vision-language models
Liangchen Liu
Nannan Wang
Dawei Zhou
Xinbo Gao
Decheng Liu
Xi Yang
Tongliang Liu
VLM
33
2
0
14 Sep 2023
Split-Boost Neural Networks
Split-Boost Neural Networks
R. G. Cestari
Gabriele Maroni
Loris Cannelli
Dario Piga
Simone Formentin
21
1
0
06 Sep 2023
Jumping through Local Minima: Quantization in the Loss Landscape of
  Vision Transformers
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
N. Frumkin
Dibakar Gope
Diana Marculescu
MQ
41
16
0
21 Aug 2023
Radiomics-Informed Deep Learning for Classification of Atrial
  Fibrillation Sub-Types from Left-Atrium CT Volumes
Radiomics-Informed Deep Learning for Classification of Atrial Fibrillation Sub-Types from Left-Atrium CT Volumes
Weihang Dai
Xiaomeng Li
Taihui Yu
Di Zhao
Jun Shen
Kwang-Ting Cheng
29
0
0
14 Aug 2023
Arithmetic with Language Models: from Memorization to Computation
Arithmetic with Language Models: from Memorization to Computation
Davide Maltoni
Matteo Ferrara
KELM
LRM
35
5
0
02 Aug 2023
Lookbehind-SAM: k steps back, 1 step forward
Lookbehind-SAM: k steps back, 1 step forward
Gonçalo Mordido
Pranshu Malviya
A. Baratin
Sarath Chandar
AAML
45
1
0
31 Jul 2023
Modify Training Directions in Function Space to Reduce Generalization
  Error
Modify Training Directions in Function Space to Reduce Generalization Error
Yi Yu
Wenlian Lu
Boyu Chen
27
0
0
25 Jul 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To
  Achieve Better Generalization
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
38
26
0
20 Jul 2023
Sharpness-Aware Graph Collaborative Filtering
Sharpness-Aware Graph Collaborative Filtering
Huiyuan Chen
Chin-Chia Michael Yeh
Yujie Fan
Yan Zheng
Junpeng Wang
Vivian Lai
Mahashweta Das
Hao Yang
26
5
0
18 Jul 2023
Snapshot Spectral Clustering -- a costless approach to deep clustering
  ensembles generation
Snapshot Spectral Clustering -- a costless approach to deep clustering ensembles generation
Adam Piróg
Halina Kwasnicka
27
1
0
17 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
32
3
0
16 Jul 2023
The Interpolating Information Criterion for Overparameterized Models
The Interpolating Information Criterion for Overparameterized Models
Liam Hodgkinson
Christopher van der Heide
Roberto Salomone
Fred Roosta
Michael W. Mahoney
20
7
0
15 Jul 2023
Variance-reduced accelerated methods for decentralized stochastic
  double-regularized nonconvex strongly-concave minimax problems
Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems
Gabriel Mancino-Ball
Yangyang Xu
20
8
0
14 Jul 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to
  Optima
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si
Chulhee Yun
28
15
0
16 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent:
  Implications for Weight Variances
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
16
3
0
08 Jun 2023
Ghost Noise for Regularizing Deep Neural Networks
Ghost Noise for Regularizing Deep Neural Networks
Atli Kosson
Dongyang Fan
Martin Jaggi
17
1
0
26 May 2023
SING: A Plug-and-Play DNN Learning Technique
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
30
0
0
25 May 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
32
6
0
25 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
19
0
0
23 May 2023
Previous
12345...91011
Next