Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 514 papers shown
Title
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient Descent
Riddhiman Bhattacharya
Tiefeng Jiang
16
0
0
14 Dec 2021
Image-to-Height Domain Translation for Synthetic Aperture Sonar
Dylan Stewart
Shawn F. Johnson
Alina Zare
21
4
0
12 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
Xiaowu Dai
Yuhua Zhu
25
4
0
02 Dec 2021
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Yaoyu Zhang
Yuqing Li
Zhongwang Zhang
Tao Luo
Z. Xu
26
21
0
30 Nov 2021
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
Matías Mendieta
Taojiannan Yang
Pu Wang
Minwoo Lee
Zhengming Ding
Cheng Chen
FedML
24
158
0
28 Nov 2021
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping
Xuran Meng
Jianfeng Yao
22
7
0
26 Nov 2021
Sharpness-aware Quantization for Deep Neural Networks
Jing Liu
Jianfei Cai
Bohan Zhuang
MQ
27
24
0
24 Nov 2021
TransMorph: Transformer for unsupervised medical image registration
Junyu Chen
Eric C. Frey
Yufan He
W. Paul Segars
Ye Li
Yong Du
ViT
MedIm
36
302
0
19 Nov 2021
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits
Hao Chen
Lili Zheng
Raed Al Kontar
Garvesh Raskutti
20
3
0
19 Nov 2021
Papaya: Practical, Private, and Scalable Federated Learning
Dzmitry Huba
John Nguyen
Kshitiz Malik
Ruiyu Zhu
Michael G. Rabbat
...
H. Srinivas
Kaikai Wang
Anthony Shoumikhin
Jesik Min
Mani Malek
FedML
110
137
0
08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
28
4
0
07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
27
14
0
01 Nov 2021
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily
Lun Du
Xiaozhou Shi
Qiang Fu
Xiaojun Ma
Hengyu Liu
Shi Han
Dongmei Zhang
40
104
0
29 Oct 2021
RoMA: Robust Model Adaptation for Offline Model-based Optimization
Sihyun Yu
Sungsoo Ahn
Le Song
Jinwoo Shin
OffRL
27
31
0
27 Oct 2021
Stable Anderson Acceleration for Deep Learning
Massimiliano Lupo Pasini
Junqi Yin
Viktor Reshniak
M. Stoyanov
15
4
0
26 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
127
98
0
16 Oct 2021
Trade-offs of Local SGD at Scale: An Empirical Study
Jose Javier Gonzalez Ortiz
Jonathan Frankle
Michael G. Rabbat
Ari S. Morcos
Nicolas Ballas
FedML
37
19
0
15 Oct 2021
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
R. Entezari
Hanie Sedghi
O. Saukh
Behnam Neyshabur
MoMe
37
216
0
12 Oct 2021
Not all noise is accounted equally: How differentially private learning benefits from large sampling rates
Friedrich Dörmann
Osvald Frisk
L. Andersen
Christian Fischer Pedersen
FedML
59
25
0
12 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
32
7
0
11 Oct 2021
Observations on K-image Expansion of Image-Mixing Augmentation for Classification
Joonhyun Jeong
Sungmin Cha
Jongwon Choi
Sangdoo Yun
Taesup Moon
Y. Yoo
VLM
21
6
0
08 Oct 2021
Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting
Chengyu Dong
Liyuan Liu
Jingbo Shang
NoLa
AAML
56
18
0
07 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
37
22
0
07 Oct 2021
Spectral Bias in Practice: The Role of Function Frequency in Generalization
Sara Fridovich-Keil
Raphael Gontijo-Lopes
Rebecca Roelofs
41
28
0
06 Oct 2021
Perturbated Gradients Updating within Unit Space for Deep Learning
Ching-Hsun Tseng
Liu Cheng
Shin-Jye Lee
Xiaojun Zeng
40
5
0
01 Oct 2021
Accelerating Encrypted Computing on Intel GPUs
Yujia Zhai
Mohannad Ibrahim
Yiqin Qiu
Fabian Boemer
Zizhong Chen
Alexey Titov
Alexander Lyashevsky
26
26
0
29 Sep 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
Scalable deeper graph neural networks for high-performance materials property prediction
Sadman Sadeed Omee
Steph-Yves M. Louis
Nihang Fu
Lai Wei
Sourin Dey
Rongzhi Dong
Qinyang Li
Jianjun Hu
70
73
0
25 Sep 2021
Towards Generalized and Incremental Few-Shot Object Detection
Yiting Li
H. Zhu
Jun Ma
C. Teo
Chen Xiang
P. Vadakkepat
T. Lee
CLL
ObjD
26
9
0
23 Sep 2021
DHA: End-to-End Joint Optimization of Data Augmentation Policy, Hyper-parameter and Architecture
Kaichen Zhou
Lanqing Hong
Shuailiang Hu
Fengwei Zhou
Binxin Ru
Jiashi Feng
Zhenguo Li
56
10
0
13 Sep 2021
MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning
T. Alkhalifah
Hanchen Wang
O. Ovcharenko
OOD
47
65
0
11 Sep 2021
Adversarial Parameter Defense by Multi-Step Risk Minimization
Zhiyuan Zhang
Ruixuan Luo
Xuancheng Ren
Qi Su
Liangyou Li
Xu Sun
AAML
25
6
0
07 Sep 2021
How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data
Zhiyuan Zhang
Lingjuan Lyu
Weiqiang Wang
Lichao Sun
Xu Sun
21
35
0
03 Sep 2021
Shift-Curvature, SGD, and Generalization
Arwen V. Bradley
C. Gomez-Uribe
Manish Reddy Vuyyuru
35
2
0
21 Aug 2021
Learning from Images: Proactive Caching with Parallel Convolutional Neural Networks
Yantong Wang
Ye Hu
Zhaohui Yang
Walid Saad
Kai‐Kit Wong
V. Friderikos
23
4
0
15 Aug 2021
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
37
1
0
12 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
54
47
0
08 Aug 2021
Batch Normalization Preconditioning for Neural Network Training
Susanna Lange
Kyle E. Helfrich
Qiang Ye
27
9
0
02 Aug 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II
Yossi Arjevani
M. Field
28
18
0
21 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
31
15
0
19 Jul 2021
Rethinking Graph Auto-Encoder Models for Attributed Graph Clustering
Nairouz Mrabah
Mohamed Bouguessa
M. Touati
Riadh Ksantini
35
62
0
19 Jul 2021
Point-Cloud Deep Learning of Porous Media for Permeability Prediction
Ali Kashefi
T. Mukerji
3DPC
AI4CE
17
34
0
18 Jul 2021
The Bayesian Learning Rule
Mohammad Emtiyaz Khan
Håvard Rue
BDL
63
73
0
09 Jul 2021
Activated Gradients for Deep Neural Networks
Mei Liu
Liangming Chen
Xiaohao Du
Long Jin
Mingsheng Shang
ODL
AI4CE
27
135
0
09 Jul 2021
What can linear interpolation of neural network loss landscapes tell us?
Tiffany J. Vlaar
Jonathan Frankle
MoMe
27
27
0
30 Jun 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
23
31
0
25 Jun 2021
Sparse Flows: Pruning Continuous-depth Models
Lucas Liebenwein
Ramin Hasani
Alexander Amini
Daniela Rus
26
16
0
24 Jun 2021
Minimum sharpness: Scale-invariant parameter-robustness of neural networks
Hikaru Ibayashi
Takuo Hamaguchi
Masaaki Imaizumi
25
5
0
23 Jun 2021
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
47
156
0
17 Jun 2021
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
Previous
1
2
3
...
5
6
7
...
9
10
11
Next