ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXivPDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 514 papers shown
Title
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient
  Descent
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient Descent
Riddhiman Bhattacharya
Tiefeng Jiang
16
0
0
14 Dec 2021
Image-to-Height Domain Translation for Synthetic Aperture Sonar
Image-to-Height Domain Translation for Synthetic Aperture Sonar
Dylan Stewart
Shawn F. Johnson
Alina Zare
21
4
0
12 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
Xiaowu Dai
Yuhua Zhu
25
4
0
02 Dec 2021
Embedding Principle: a hierarchical structure of loss landscape of deep
  neural networks
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Yaoyu Zhang
Yuqing Li
Zhongwang Zhang
Tao Luo
Z. Xu
26
21
0
30 Nov 2021
Local Learning Matters: Rethinking Data Heterogeneity in Federated
  Learning
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
Matías Mendieta
Taojiannan Yang
Pu Wang
Minwoo Lee
Zhengming Ding
Cheng Chen
FedML
24
158
0
28 Nov 2021
Impact of classification difficulty on the weight matrices spectra in
  Deep Learning and application to early-stopping
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping
Xuran Meng
Jianfeng Yao
22
7
0
26 Nov 2021
Sharpness-aware Quantization for Deep Neural Networks
Sharpness-aware Quantization for Deep Neural Networks
Jing Liu
Jianfei Cai
Bohan Zhuang
MQ
27
24
0
24 Nov 2021
TransMorph: Transformer for unsupervised medical image registration
TransMorph: Transformer for unsupervised medical image registration
Junyu Chen
Eric C. Frey
Yufan He
W. Paul Segars
Ye Li
Yong Du
ViT
MedIm
36
302
0
19 Nov 2021
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent:
  Convergence Guarantees and Empirical Benefits
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits
Hao Chen
Lili Zheng
Raed Al Kontar
Garvesh Raskutti
20
3
0
19 Nov 2021
Papaya: Practical, Private, and Scalable Federated Learning
Papaya: Practical, Private, and Scalable Federated Learning
Dzmitry Huba
John Nguyen
Kshitiz Malik
Ruiyu Zhu
Michael G. Rabbat
...
H. Srinivas
Kaikai Wang
Anthony Shoumikhin
Jesik Min
Mani Malek
FedML
110
137
0
08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary
  regime
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
28
4
0
07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
27
14
0
01 Nov 2021
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both
  Homophily and Heterophily
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily
Lun Du
Xiaozhou Shi
Qiang Fu
Xiaojun Ma
Hengyu Liu
Shi Han
Dongmei Zhang
40
104
0
29 Oct 2021
RoMA: Robust Model Adaptation for Offline Model-based Optimization
RoMA: Robust Model Adaptation for Offline Model-based Optimization
Sihyun Yu
Sungsoo Ahn
Le Song
Jinwoo Shin
OffRL
27
31
0
27 Oct 2021
Stable Anderson Acceleration for Deep Learning
Stable Anderson Acceleration for Deep Learning
Massimiliano Lupo Pasini
Junqi Yin
Viktor Reshniak
M. Stoyanov
15
4
0
26 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
127
98
0
16 Oct 2021
Trade-offs of Local SGD at Scale: An Empirical Study
Trade-offs of Local SGD at Scale: An Empirical Study
Jose Javier Gonzalez Ortiz
Jonathan Frankle
Michael G. Rabbat
Ari S. Morcos
Nicolas Ballas
FedML
37
19
0
15 Oct 2021
The Role of Permutation Invariance in Linear Mode Connectivity of Neural
  Networks
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
R. Entezari
Hanie Sedghi
O. Saukh
Behnam Neyshabur
MoMe
37
216
0
12 Oct 2021
Not all noise is accounted equally: How differentially private learning
  benefits from large sampling rates
Not all noise is accounted equally: How differentially private learning benefits from large sampling rates
Friedrich Dörmann
Osvald Frisk
L. Andersen
Christian Fischer Pedersen
FedML
59
25
0
12 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic
  Differential Equations
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
32
7
0
11 Oct 2021
Observations on K-image Expansion of Image-Mixing Augmentation for
  Classification
Observations on K-image Expansion of Image-Mixing Augmentation for Classification
Joonhyun Jeong
Sungmin Cha
Jongwon Choi
Sangdoo Yun
Taesup Moon
Y. Yoo
VLM
21
6
0
08 Oct 2021
Label Noise in Adversarial Training: A Novel Perspective to Study Robust
  Overfitting
Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting
Chengyu Dong
Liyuan Liu
Jingbo Shang
NoLa
AAML
56
18
0
07 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic
  Bounds and Implications
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
37
22
0
07 Oct 2021
Spectral Bias in Practice: The Role of Function Frequency in
  Generalization
Spectral Bias in Practice: The Role of Function Frequency in Generalization
Sara Fridovich-Keil
Raphael Gontijo-Lopes
Rebecca Roelofs
41
28
0
06 Oct 2021
Perturbated Gradients Updating within Unit Space for Deep Learning
Perturbated Gradients Updating within Unit Space for Deep Learning
Ching-Hsun Tseng
Liu Cheng
Shin-Jye Lee
Xiaojun Zeng
40
5
0
01 Oct 2021
Accelerating Encrypted Computing on Intel GPUs
Accelerating Encrypted Computing on Intel GPUs
Yujia Zhai
Mohannad Ibrahim
Yiqin Qiu
Fabian Boemer
Zizhong Chen
Alexey Titov
Alexander Lyashevsky
26
26
0
29 Sep 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
Scalable deeper graph neural networks for high-performance materials
  property prediction
Scalable deeper graph neural networks for high-performance materials property prediction
Sadman Sadeed Omee
Steph-Yves M. Louis
Nihang Fu
Lai Wei
Sourin Dey
Rongzhi Dong
Qinyang Li
Jianjun Hu
70
73
0
25 Sep 2021
Towards Generalized and Incremental Few-Shot Object Detection
Towards Generalized and Incremental Few-Shot Object Detection
Yiting Li
H. Zhu
Jun Ma
C. Teo
Chen Xiang
P. Vadakkepat
T. Lee
CLL
ObjD
26
9
0
23 Sep 2021
DHA: End-to-End Joint Optimization of Data Augmentation Policy,
  Hyper-parameter and Architecture
DHA: End-to-End Joint Optimization of Data Augmentation Policy, Hyper-parameter and Architecture
Kaichen Zhou
Lanqing Hong
Shuailiang Hu
Fengwei Zhou
Binxin Ru
Jiashi Feng
Zhenguo Li
56
10
0
13 Sep 2021
MLReal: Bridging the gap between training on synthetic data and real
  data applications in machine learning
MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning
T. Alkhalifah
Hanchen Wang
O. Ovcharenko
OOD
47
65
0
11 Sep 2021
Adversarial Parameter Defense by Multi-Step Risk Minimization
Adversarial Parameter Defense by Multi-Step Risk Minimization
Zhiyuan Zhang
Ruixuan Luo
Xuancheng Ren
Qi Su
Liangyou Li
Xu Sun
AAML
25
6
0
07 Sep 2021
How to Inject Backdoors with Better Consistency: Logit Anchoring on
  Clean Data
How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data
Zhiyuan Zhang
Lingjuan Lyu
Weiqiang Wang
Lichao Sun
Xu Sun
21
35
0
03 Sep 2021
Shift-Curvature, SGD, and Generalization
Shift-Curvature, SGD, and Generalization
Arwen V. Bradley
C. Gomez-Uribe
Manish Reddy Vuyyuru
35
2
0
21 Aug 2021
Learning from Images: Proactive Caching with Parallel Convolutional
  Neural Networks
Learning from Images: Proactive Caching with Parallel Convolutional Neural Networks
Yantong Wang
Ye Hu
Zhaohui Yang
Walid Saad
Kai‐Kit Wong
V. Friderikos
23
4
0
15 Aug 2021
Logit Attenuating Weight Normalization
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
37
1
0
12 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep
  Learning Workloads in GPU Clusters
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
54
47
0
08 Aug 2021
Batch Normalization Preconditioning for Neural Network Training
Batch Normalization Preconditioning for Neural Network Training
Susanna Lange
Kyle E. Helfrich
Qiang Ye
27
9
0
02 Aug 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural
  Networks: A Tale of Symmetry II
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II
Yossi Arjevani
M. Field
28
18
0
21 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
  and Anomalous Diffusion
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
31
15
0
19 Jul 2021
Rethinking Graph Auto-Encoder Models for Attributed Graph Clustering
Rethinking Graph Auto-Encoder Models for Attributed Graph Clustering
Nairouz Mrabah
Mohamed Bouguessa
M. Touati
Riadh Ksantini
35
62
0
19 Jul 2021
Point-Cloud Deep Learning of Porous Media for Permeability Prediction
Point-Cloud Deep Learning of Porous Media for Permeability Prediction
Ali Kashefi
T. Mukerji
3DPC
AI4CE
17
34
0
18 Jul 2021
The Bayesian Learning Rule
The Bayesian Learning Rule
Mohammad Emtiyaz Khan
Håvard Rue
BDL
63
73
0
09 Jul 2021
Activated Gradients for Deep Neural Networks
Activated Gradients for Deep Neural Networks
Mei Liu
Liangming Chen
Xiaohao Du
Long Jin
Mingsheng Shang
ODL
AI4CE
27
135
0
09 Jul 2021
What can linear interpolation of neural network loss landscapes tell us?
What can linear interpolation of neural network loss landscapes tell us?
Tiffany J. Vlaar
Jonathan Frankle
MoMe
27
27
0
30 Jun 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
23
31
0
25 Jun 2021
Sparse Flows: Pruning Continuous-depth Models
Sparse Flows: Pruning Continuous-depth Models
Lucas Liebenwein
Ramin Hasani
Alexander Amini
Daniela Rus
26
16
0
24 Jun 2021
Minimum sharpness: Scale-invariant parameter-robustness of neural
  networks
Minimum sharpness: Scale-invariant parameter-robustness of neural networks
Hikaru Ibayashi
Takuo Hamaguchi
Masaaki Imaizumi
25
5
0
23 Jun 2021
Deep Learning Through the Lens of Example Difficulty
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
47
156
0
17 Jun 2021
On Large-Cohort Training for Federated Learning
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
Previous
123...567...91011
Next