Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.02677
Cited By
v1
v2 (latest)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
50 / 2,054 papers shown
Title
A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization
Bowen Cheng
Yunchao Wei
Jiahui Yu
Shiyu Chang
Jinjun Xiong
Wen-mei W. Hwu
Thomas S. Huang
Humphrey Shi
OOD
VLM
113
6
0
23 Nov 2018
Rethinking ImageNet Pre-training
Kaiming He
Ross B. Girshick
Piotr Dollár
VLM
SSeg
141
1,088
0
21 Nov 2018
Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?
Ping Luo
Zhanglin Peng
Jiamin Ren
Ruimao Zhang
FAtt
OOD
53
7
0
19 Nov 2018
Batch DropBlock Network for Person Re-identification and Beyond
Zuozhuo Dai
Mingqiang Chen
Xiaodong Gu
Siyu Zhu
Ping Tan
OOD
98
247
0
17 Nov 2018
Image Classification at Supercomputer Scale
Chris Ying
Sameer Kumar
Dehao Chen
Tao Wang
Youlong Cheng
VLM
77
123
0
16 Nov 2018
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Yanping Huang
Yonglong Cheng
Ankur Bapna
Orhan Firat
Mia Xu Chen
...
HyoukJoong Lee
Jiquan Ngiam
Quoc V. Le
Yonghui Wu
Zhifeng Chen
GNN
MoE
52
7
0
16 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
89
77
0
13 Nov 2018
Importance Weighted Evolution Strategies
Victor Campos
Xavier Giró-i-Nieto
Jordi Torres
41
1
0
12 Nov 2018
Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training
Youjie Li
Hang Qiu
Songze Li
A. Avestimehr
Nam Sung Kim
Alex Schwing
FedML
118
104
0
08 Nov 2018
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training
Timo C. Wunderlich
Zhifeng Lin
S. A. Aamir
Andreas Grübl
Youjie Li
David Stöckel
Alex Schwing
M. Annavaram
A. Avestimehr
MQ
47
64
0
08 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
117
409
0
08 Nov 2018
Democratizing Production-Scale Distributed Deep Learning
Minghuang Ma
Hadi Pouransari
Daniel Chao
Saurabh N. Adya
S. Serrano
Yi Qin
Dan Gimnicher
Dominic Walsh
MoE
110
6
0
31 Oct 2018
Accelerating SGD with momentum for over-parameterized learning
Chaoyue Liu
M. Belkin
ODL
110
19
0
31 Oct 2018
Compact Generalized Non-local Network
Kaiyu Yue
Ming Sun
Yuchen Yuan
Feng Zhou
Errui Ding
Fuxin Xu
83
162
0
31 Oct 2018
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
105
277
0
29 Oct 2018
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
K. Chahal
Manraj Singh Grover
Kuntal Dey
3DH
OOD
90
54
0
28 Oct 2018
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
A. A. Awan
Jeroen Bédorf
Ching-Hsiang Chu
Hari Subramoni
D. Panda
GNN
61
45
0
25 Oct 2018
Batch Normalization Sampling
Zhaodong Chen
Lei Deng
Guoqi Li
Jiawei Sun
Xing Hu
Xin Ma
Yuan Xie
41
0
0
25 Oct 2018
Language Modeling at Scale
Md. Mostofa Ali Patwary
Milind Chabbi
Heewoo Jun
Jiaji Huang
G. Diamos
Kenneth Church
ALM
41
5
0
23 Oct 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
110
232
0
19 Oct 2018
Distributed Learning over Unreliable Networks
Chen Yu
Hanlin Tang
Cédric Renggli
S. Kassing
Ankit Singla
Dan Alistarh
Ce Zhang
Ji Liu
OOD
105
61
0
17 Oct 2018
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
Sharan Vaswani
Francis R. Bach
Mark Schmidt
119
301
0
16 Oct 2018
Quasi-hyperbolic momentum and Adam for deep learning
Jerry Ma
Denis Yarats
ODL
165
130
0
16 Oct 2018
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Zhibin Liao
Tom Drummond
Ian Reid
G. Carneiro
80
23
0
16 Oct 2018
A System for Massively Parallel Hyperparameter Tuning
Liam Li
Kevin Jamieson
Afshin Rostamizadeh
Ekaterina Gonina
Moritz Hardt
Benjamin Recht
Ameet Talwalkar
101
387
0
13 Oct 2018
Toward Understanding the Impact of Staleness in Distributed Machine Learning
Wei-Ming Dai
Yi Zhou
Nanqing Dong
Huatian Zhang
Eric Xing
67
82
0
08 Oct 2018
Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection
Bowen Cheng
Yunchao Wei
Rogerio Feris
Jinjun Xiong
Wen-mei W. Hwu
Thomas Huang
Humphrey Shi
ObjD
69
48
0
05 Oct 2018
Multi-view X-ray R-CNN
Jan-Martin O. Steitz
Faraz Saeedan
Stefan Roth
56
25
0
04 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
137
201
0
02 Oct 2018
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
96
42
0
02 Oct 2018
Dynamic Sparse Graph for Efficient Deep Learning
Liu Liu
Lei Deng
Xing Hu
Maohua Zhu
Guoqi Li
Yufei Ding
Yuan Xie
GNN
90
42
0
01 Oct 2018
Mini-batch Serialization: CNN Training with Inter-layer Data Reuse
Sangkug Lym
Armand Behroozi
W. Wen
Ge Li
Yongkee Kwon
M. Erez
41
26
0
30 Sep 2018
The Convergence of Sparsified Gradient Methods
Dan Alistarh
Torsten Hoefler
M. Johansson
Sarit Khirirat
Nikola Konstantinov
Cédric Renggli
181
493
0
27 Sep 2018
Bounding Box Regression with Uncertainty for Accurate Object Detection
Yihui He
Chenchen Zhu
Jianren Wang
Marios Savvides
Xinming Zhang
ObjD
94
471
0
23 Sep 2018
Automated Classification of Sleep Stages and EEG Artifacts in Mice with Deep Learning
J. Schwabedal
D. Sippel
M. Brandt
Stephan Bialonski
31
12
0
22 Sep 2018
Sparsified SGD with Memory
Sebastian U. Stich
Jean-Baptiste Cordonnier
Martin Jaggi
106
753
0
20 Sep 2018
Identifying Generalization Properties in Neural Networks
Huan Wang
N. Keskar
Caiming Xiong
R. Socher
74
50
0
19 Sep 2018
Label Denoising with Large Ensembles of Heterogeneous Neural Networks
Pavel Ostyakov
Elizaveta Logacheva
Roman Suvorov
Vladimir Aliev
Gleb Sterkin
Oleg Khomenko
Sergey I. Nikolenko
NoLa
70
28
0
12 Sep 2018
On the Structural Sensitivity of Deep Convolutional Networks to the Directions of Fourier Basis Functions
Yusuke Tsuzuku
Issei Sato
AAML
82
62
0
11 Sep 2018
Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks
Shivang Agarwal
Jean Ogier du Terrail
F. Jurie
ObjD
158
125
0
10 Sep 2018
Towards Understanding Regularization in Batch Normalization
Ping Luo
Xinjiang Wang
Wenqi Shao
Zhanglin Peng
MLT
AI4CE
91
180
0
04 Sep 2018
PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track
Takuya Akiba
Tommi Kerola
Yusuke Niitani
Toru Ogawa
Shotaro Sano
Shuji Suzuki
68
20
0
04 Sep 2018
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Nikolay Bogoychev
Marcin Junczys-Dowmunt
Kenneth Heafield
Alham Fikri Aji
ODL
49
17
0
27 Aug 2018
DeepTracker: Visualizing the Training Process of Convolutional Neural Networks
Dongyu Liu
Weiwei Cui
Kai Jin
Yuxiao Guo
Huamin Qu
HAI
53
35
0
26 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
125
432
0
22 Aug 2018
Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
E. Georganas
Sasikanth Avancha
K. Banerjee
Dhiraj D. Kalamkar
G. Henry
Hans Pabst
A. Heinecke
BDL
59
107
0
16 Aug 2018
RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Jiarui Fang
Haohuan Fu
Guangwen Yang
Cho-Jui Hsieh
GNN
106
25
0
13 Aug 2018
Fast, Better Training Trick -- Random Gradient
Jiakai Wei
ODL
23
2
0
13 Aug 2018
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
Soojeong Kim
Gyeong-In Yu
Hojin Park
Sungwoo Cho
Eunji Jeong
Hyeonmin Ha
Sanha Lee
Joo Seong Jeong
Byung-Gon Chun
67
75
0
08 Aug 2018
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Raul Puri
Robert M. Kirby
Nikolai Yakovenko
Bryan Catanzaro
79
29
0
03 Aug 2018
Previous
1
2
3
...
37
38
39
40
41
42
Next