ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXivPDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

14 / 514 papers shown
Title
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia C. Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
20
1,012
0
23 May 2017
Practical Processing of Mobile Sensor Data for Continual Deep Learning
  Predictions
Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions
Kleomenis Katevas
Ilias Leontiadis
M. Pielot
Joan Serrà
HAI
11
12
0
17 May 2017
Nonlinear Information Bottleneck
Nonlinear Information Bottleneck
Artemy Kolchinsky
Brendan D. Tracey
David Wolpert
12
152
0
06 May 2017
Snapshot Ensembles: Train 1, get M for free
Snapshot Ensembles: Train 1, get M for free
Gao Huang
Yixuan Li
Geoff Pleiss
Zhuang Liu
J. Hopcroft
Kilian Q. Weinberger
OOD
FedML
UQCV
45
935
0
01 Apr 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural
  Networks with Many More Parameters than Training Data
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
Gintare Karolina Dziugaite
Daniel M. Roy
50
799
0
31 Mar 2017
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
46
755
0
15 Mar 2017
Langevin Dynamics with Continuous Tempering for Training Deep Neural
  Networks
Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks
Nanyang Ye
Zhanxing Zhu
Rafał K. Mantiuk
16
20
0
13 Mar 2017
Data-Dependent Stability of Stochastic Gradient Descent
Data-Dependent Stability of Stochastic Gradient Descent
Ilja Kuzborskij
Christoph H. Lampert
MLT
9
165
0
05 Mar 2017
Incorporating Global Visual Features into Attention-Based Neural Machine
  Translation
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
Iacer Calixto
Qun Liu
Nick Campbell
24
154
0
23 Jan 2017
Incremental Sequence Learning
Incremental Sequence Learning
E. Jong
CLL
24
5
0
09 Nov 2016
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Soham De
A. Yadav
David Jacobs
Tom Goldstein
ODL
14
62
0
18 Oct 2016
Distributed Training of Deep Neural Networks: Theoretical and Practical
  Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
J. Keuper
Franz-Josef Pfreundt
GNN
55
97
0
22 Sep 2016
Parallelizing Word2Vec in Shared and Distributed Memory
Parallelizing Word2Vec in Shared and Distributed Memory
Shihao Ji
N. Satish
Sheng Li
Pradeep Dubey
VLM
MoE
14
72
0
15 Apr 2016
The Loss Surfaces of Multilayer Networks
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
179
1,185
0
30 Nov 2014
Previous
123...10119