Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.02677
Cited By
v1
v2 (latest)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
50 / 2,054 papers shown
Title
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mengzhao Chen
Orhan Firat
Ankur Bapna
Melvin Johnson
Wolfgang Macherey
...
Niki Parmar
M. Schuster
Zhifeng Chen
Yonghui Wu
Macduff Hughes
AIMat
102
457
0
26 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
83
671
0
20 Apr 2018
BigDL: A Distributed Deep Learning Framework for Big Data
J. Dai
Yiheng Wang
Xin Qiu
Ding Ding
Yao Zhang
...
Bowen She
Dongjie Shi
Qiaoling Lu
Kai-Qi Huang
Guoqiong Song
FedML
MoE
52
101
0
16 Apr 2018
Local Descriptors Optimized for Average Precision
Kun He
Yan Lu
Stan Sclaroff
65
196
0
15 Apr 2018
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching
Yosuke Oyama
Tal Ben-Nun
Torsten Hoefler
Satoshi Matsuoka
22
1
0
13 Apr 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
96
1,056
0
11 Apr 2018
Recurrent Neural Networks for Person Re-identification Revisited
J. Boin
A. Araújo
B. Girod
36
4
0
10 Apr 2018
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
339
409
0
09 Apr 2018
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
110
312
0
01 Apr 2018
Group Normalization
Yuxin Wu
Kaiming He
259
3,685
0
22 Mar 2018
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
79
171
0
22 Mar 2018
Unsupervised Representation Learning by Predicting Image Rotations
Spyros Gidaris
Praveer Singh
N. Komodakis
OOD
SSL
DRL
278
3,304
0
21 Mar 2018
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Bowen Cheng
Yunchao Wei
Humphrey Shi
Rogerio Feris
Jinjun Xiong
Thomas Huang
ObjD
102
209
0
19 Mar 2018
Towards Image Understanding from Deep Compression without Decoding
Robert Torfason
Fabian Mentzer
E. Agustsson
Michael Tschannen
Radu Timofte
Luc Van Gool
AI4CE
79
155
0
16 Mar 2018
TBD: Benchmarking and Analyzing Deep Neural Network Training
Hongyu Zhu
Mohamed Akrout
Bojian Zheng
Andrew Pelegris
Amar Phanishayee
Bianca Schroeder
Gennady Pekhimenko
90
80
0
16 Mar 2018
Escaping Saddles with Stochastic Gradients
Hadi Daneshmand
Jonas Köhler
Aurelien Lucchi
Thomas Hofmann
75
162
0
15 Mar 2018
GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent
J. Daily
Abhinav Vishnu
Charles Siegel
T. Warfel
Vinay C. Amatya
64
95
0
15 Mar 2018
Deep Learning in Mobile and Wireless Networking: A Survey
Chaoyun Zhang
P. Patras
Hamed Haddadi
134
1,320
0
12 Mar 2018
High Throughput Synchronous Distributed Stochastic Gradient Descent
Michael Teng
Frank Wood
52
2
0
12 Mar 2018
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
70
200
0
08 Mar 2018
Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
P. Mitra
21
4
0
08 Mar 2018
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu
Rachel A. Ward
Léon Bottou
70
87
0
07 Mar 2018
Accelerated Methods for Deep Reinforcement Learning
Adam Stooke
Pieter Abbeel
OffRL
OnRL
73
136
0
07 Mar 2018
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Yuhuai Wu
Mengye Ren
Renjie Liao
Roger C. Grosse
109
138
0
06 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
81
40
0
01 Mar 2018
Neural Inverse Rendering for General Reflectance Photometric Stereo
Tatsunori Taniai
Takanori Maehara
140
105
0
28 Feb 2018
Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning
Sanjit Bhat
David Lu
Albert Kwon
S. Devadas
AAML
71
195
0
28 Feb 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
87
713
0
26 Feb 2018
Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs
Andres Milioto
C. Stachniss
SSeg
99
86
0
25 Feb 2018
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
96
119
0
24 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
100
167
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
Basel Alomair
207
1,151
0
22 Feb 2018
SparCML: High-Performance Sparse Communication for Machine Learning
Cédric Renggli
Saleh Ashkboos
Mehdi Aghagolzadeh
Dan Alistarh
Torsten Hoefler
91
127
0
22 Feb 2018
3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning
Hyeontaek Lim
D. Andersen
M. Kaminsky
134
70
0
21 Feb 2018
Variance-based Gradient Compression for Efficient Distributed Deep Learning
Yusuke Tsuzuku
H. Imachi
Takuya Akiba
FedML
73
82
0
16 Feb 2018
Horovod: fast and easy distributed deep learning in TensorFlow
Alexander Sergeev
Mike Del Balso
106
1,223
0
15 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning
Raghu Bollapragada
Dheevatsa Mudigere
J. Nocedal
Hao-Jun Michael Shi
P. T. P. Tang
ODL
114
153
0
15 Feb 2018
Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks
Zhihao Jia
Sina Lin
C. Qi
A. Aiken
101
117
0
14 Feb 2018
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Nicolas Vasilache
O. Zinenko
Theodoros Theodoridis
Priya Goyal
Zach DeVito
William S. Moses
Sven Verdoolaege
Andrew Adams
Albert Cohen
126
438
0
13 Feb 2018
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Zhize Li
Jian Li
105
116
0
13 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
ODL
137
1,053
0
13 Feb 2018
Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks
Yusuke Tsuzuku
Issei Sato
Masashi Sugiyama
AAML
117
309
0
12 Feb 2018
ShakeDrop Regularization for Deep Residual Learning
Yoshihiro Yamada
Masakazu Iwamura
Takuya Akiba
K. Kise
119
164
0
07 Feb 2018
Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training
Liangchen Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
61
1
0
30 Jan 2018
On Scale-out Deep Learning Training for Cloud and HPC
Srinivas Sridharan
K. Vaidyanathan
Dhiraj D. Kalamkar
Dipankar Das
Mikhail E. Smorkalov
...
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
Pradeep Dubey
BDL
70
30
0
24 Jan 2018
Bayesian Deep Convolutional Encoder-Decoder Networks for Surrogate Modeling and Uncertainty Quantification
Yinhao Zhu
N. Zabaras
UQCV
BDL
115
649
0
21 Jan 2018
Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes
Igor Adamski
R. Adamski
T. Grel
Adam Jedrych
Kamil Kaczmarek
Henryk Michalewski
OffRL
121
37
0
09 Jan 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
291
1,901
0
28 Dec 2017
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
Hang Zhao
Antonio Torralba
Lorenzo Torresani
Zhicheng Yan
VLM
AI4TS
87
29
0
26 Dec 2017
Block-diagonal Hessian-free Optimization for Training Neural Networks
Huishuai Zhang
Caiming Xiong
James Bradbury
R. Socher
ODL
60
22
0
20 Dec 2017
Previous
1
2
3
...
39
40
41
42
Next