Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.02677
Cited By
v1
v2 (latest)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
50 / 2,054 papers shown
Title
Parallel Complexity of Forward and Backward Propagation
Maxim Naumov
52
8
0
18 Dec 2017
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
117
291
0
18 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
95
84
0
12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
45
20
0
08 Dec 2017
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
79
174
0
07 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
112
136
0
06 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
231
1,413
0
05 Dec 2017
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
Chung-Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
...
Katya Gonina
Navdeep Jaitly
Yue Liu
J. Chorowski
M. Bacchiani
AI4TS
174
1,155
0
05 Dec 2017
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
258
3,042
0
30 Nov 2017
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
366
8,940
0
21 Nov 2017
MegDet: A Large Mini-Batch Object Detector
Chao Peng
Tete Xiao
Zeming Li
Yuning Jiang
Xiangyu Zhang
Kai Jia
Gang Yu
Jian Sun
ObjD
209
318
0
20 Nov 2017
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Ziming Zhang
Yuanwei Wu
Guanghui Wang
ODL
65
28
0
19 Nov 2017
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Shaoshuai Shi
Qiang-qiang Wang
Xiaowen Chu
90
110
0
16 Nov 2017
AOGNets: Compositional Grammatical Architectures for Deep Learning
Xilai Li
Xi Song
Tianfu Wu
72
26
0
15 Nov 2017
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
85
463
0
13 Nov 2017
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes
Takuya Akiba
Shuji Suzuki
Keisuke Fukuda
VLM
76
314
0
12 Nov 2017
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
70
55
0
12 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
63
4
0
02 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
130
996
0
01 Nov 2017
ChainerMN: Scalable Distributed Deep Learning Framework
Takuya Akiba
Keisuke Fukuda
Shuji Suzuki
AI4CE
BDL
GNN
65
60
0
31 Oct 2017
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
104
304
0
30 Oct 2017
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
323
9,831
0
25 Oct 2017
Asynchronous Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian
Wei Zhang
Ce Zhang
Ji Liu
ODL
75
500
0
18 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
126
253
0
17 Oct 2017
Synkhronos: a Multi-GPU Theano Extension for Data Parallelism
Adam Stooke
Pieter Abbeel
SyDa
GNN
24
0
0
11 Oct 2017
Slim-DP: A Light Communication Data Parallelism for DNN
Shizhao Sun
Wei-neng Chen
Jiang Bian
Xiaoguang Liu
Tie-Yan Liu
24
0
0
27 Sep 2017
Stochastic Nonconvex Optimization with Large Minibatches
Weiran Wang
Nathan Srebro
96
26
0
25 Sep 2017
Deep Sparse Subspace Clustering
Xi Peng
Jiashi Feng
Shijie Xiao
Jiwen Lu
Zhang Yi
Shuicheng Yan
60
22
0
25 Sep 2017
Online Learning of a Memory for Learning Rates
Franziska Meier
Daniel Kappler
S. Schaal
62
21
0
20 Sep 2017
ImageNet Training in Minutes
Yang You
Zhao-jie Zhang
Cho-Jui Hsieh
J. Demmel
Kurt Keutzer
VLM
LRM
186
57
0
14 Sep 2017
What does fault tolerant Deep Learning need from MPI?
Vinay C. Amatya
Abhinav Vishnu
Charles Siegel
J. Daily
74
19
0
11 Sep 2017
An Adaptive Sampling Scheme to Efficiently Train Fully Convolutional Networks for Semantic Segmentation
L. Berger
E. Hyde
M. Jorge Cardoso
Sebastien Ourselin
SSeg
95
41
0
08 Sep 2017
Simple Recurrent Units for Highly Parallelizable Recurrence
Tao Lei
Yu Zhang
Sida I. Wang
Huijing Dai
Yoav Artzi
LRM
165
277
0
08 Sep 2017
Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads
Tian Li
Jie Zhong
Ji Liu
Wentao Wu
Ce Zhang
54
70
0
24 Aug 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
108
518
0
23 Aug 2017
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
169
854
0
13 Aug 2017
Distributed Training Large-Scale Deep Architectures
Shang-Xuan Zou
Chun-Yen Chen
Jui-Lin Wu
Chun-Nan Chou
Chia-Chin Tsao
Kuan-Chieh Tung
Ting-Wei Lin
Cheng-Lung Sung
Edward Y. Chang
53
22
0
10 Aug 2017
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
178
1,098
0
07 Aug 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAML
ODL
111
44
0
26 Jul 2017
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
Fartash Faghri
David J. Fleet
J. Kiros
Sanja Fidler
VLM
87
183
0
18 Jul 2017
Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures
Joseph Suárez
Clare Zhu
49
0
0
08 Jul 2017
Stochastic, Distributed and Federated Optimization for Machine Learning
Jakub Konecný
FedML
83
38
0
04 Jul 2017
Parle: parallelizing stochastic gradient descent
Pratik Chaudhari
Carlo Baldassi
R. Zecchina
Stefano Soatto
Ameet Talwalkar
Adam M. Oberman
ODL
FedML
85
21
0
03 Jul 2017
Training a Fully Convolutional Neural Network to Route Integrated Circuits
Sambhav R. Jain
Kye L. Okabe
SSL
22
8
0
27 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
89
11
0
18 Jun 2017
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
Levent Sagun
Utku Evci
V. U. Güney
Yann N. Dauphin
Léon Bottou
107
420
0
14 Jun 2017
Training Quantized Nets: A Deeper Understanding
Hao Li
Soham De
Zheng Xu
Christoph Studer
H. Samet
Tom Goldstein
MQ
87
211
0
07 Jun 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
198
803
0
24 May 2017
Insensitive Stochastic Gradient Twin Support Vector Machine for Large Scale Problems
Zhen Wang
Yifei Shao
Lan Bai
Li-Ming Liu
N. Deng
43
40
0
19 Apr 2017
Deep Relaxation: partial differential equations for optimizing deep neural networks
Pratik Chaudhari
Adam M. Oberman
Stanley Osher
Stefano Soatto
G. Carlier
174
154
0
17 Apr 2017
Previous
1
2
3
...
40
41
42
Next