ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
The Best of Both Worlds: Combining Recent Advances in Neural Machine
  Translation
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mengzhao Chen
Orhan Firat
Ankur Bapna
Melvin Johnson
Wolfgang Macherey
...
Niki Parmar
M. Schuster
Zhifeng Chen
Yonghui Wu
Macduff Hughes
AIMat
102
457
0
26 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
83
671
0
20 Apr 2018
BigDL: A Distributed Deep Learning Framework for Big Data
BigDL: A Distributed Deep Learning Framework for Big Data
J. Dai
Yiheng Wang
Xin Qiu
Ding Ding
Yao Zhang
...
Bowen She
Dongjie Shi
Qiaoling Lu
Kai-Qi Huang
Guoqiong Song
FedMLMoE
52
101
0
16 Apr 2018
Local Descriptors Optimized for Average Precision
Local Descriptors Optimized for Average Precision
Kun He
Yan Lu
Stan Sclaroff
65
196
0
15 Apr 2018
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching
Yosuke Oyama
Tal Ben-Nun
Torsten Hoefler
Satoshi Matsuoka
22
1
0
13 Apr 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
96
1,056
0
11 Apr 2018
Recurrent Neural Networks for Person Re-identification Revisited
Recurrent Neural Networks for Person Re-identification Revisited
J. Boin
A. Araújo
B. Girod
36
4
0
10 Apr 2018
Large scale distributed neural network training through online
  distillation
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
339
409
0
09 Apr 2018
Training Tips for the Transformer Model
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
110
312
0
01 Apr 2018
Group Normalization
Group Normalization
Yuxin Wu
Kaiming He
259
3,685
0
22 Mar 2018
An Analysis of Neural Language Modeling at Multiple Scales
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
79
171
0
22 Mar 2018
Unsupervised Representation Learning by Predicting Image Rotations
Unsupervised Representation Learning by Predicting Image Rotations
Spyros Gidaris
Praveer Singh
N. Komodakis
OODSSLDRL
278
3,304
0
21 Mar 2018
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Bowen Cheng
Yunchao Wei
Humphrey Shi
Rogerio Feris
Jinjun Xiong
Thomas Huang
ObjD
102
209
0
19 Mar 2018
Towards Image Understanding from Deep Compression without Decoding
Towards Image Understanding from Deep Compression without Decoding
Robert Torfason
Fabian Mentzer
E. Agustsson
Michael Tschannen
Radu Timofte
Luc Van Gool
AI4CE
79
155
0
16 Mar 2018
TBD: Benchmarking and Analyzing Deep Neural Network Training
TBD: Benchmarking and Analyzing Deep Neural Network Training
Hongyu Zhu
Mohamed Akrout
Bojian Zheng
Andrew Pelegris
Amar Phanishayee
Bianca Schroeder
Gennady Pekhimenko
90
80
0
16 Mar 2018
Escaping Saddles with Stochastic Gradients
Escaping Saddles with Stochastic Gradients
Hadi Daneshmand
Jonas Köhler
Aurelien Lucchi
Thomas Hofmann
75
162
0
15 Mar 2018
GossipGraD: Scalable Deep Learning using Gossip Communication based
  Asynchronous Gradient Descent
GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent
J. Daily
Abhinav Vishnu
Charles Siegel
T. Warfel
Vinay C. Amatya
64
95
0
15 Mar 2018
Deep Learning in Mobile and Wireless Networking: A Survey
Deep Learning in Mobile and Wireless Networking: A Survey
Chaoyun Zhang
P. Patras
Hamed Haddadi
134
1,320
0
12 Mar 2018
High Throughput Synchronous Distributed Stochastic Gradient Descent
High Throughput Synchronous Distributed Stochastic Gradient Descent
Michael Teng
Frank Wood
52
2
0
12 Mar 2018
TicTac: Accelerating Distributed Deep Learning with Communication
  Scheduling
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
70
200
0
08 Mar 2018
Fast Convergence for Stochastic and Distributed Gradient Descent in the
  Interpolation Limit
Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
P. Mitra
21
4
0
08 Mar 2018
WNGrad: Learn the Learning Rate in Gradient Descent
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu
Rachel A. Ward
Léon Bottou
70
87
0
07 Mar 2018
Accelerated Methods for Deep Reinforcement Learning
Accelerated Methods for Deep Reinforcement Learning
Adam Stooke
Pieter Abbeel
OffRLOnRL
73
136
0
07 Mar 2018
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Yuhuai Wu
Mengye Ren
Renjie Liao
Roger C. Grosse
109
138
0
06 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of
  Escaping from Sharp Minima and Regularization Effects
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
81
40
0
01 Mar 2018
Neural Inverse Rendering for General Reflectance Photometric Stereo
Neural Inverse Rendering for General Reflectance Photometric Stereo
Tatsunori Taniai
Takanori Maehara
140
105
0
28 Feb 2018
Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep
  Learning
Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning
Sanjit Bhat
David Lu
Albert Kwon
S. Devadas
AAML
71
195
0
28 Feb 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
87
713
0
26 Feb 2018
Bonnet: An Open-Source Training and Deployment Framework for Semantic
  Segmentation in Robotics using CNNs
Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs
Andres Milioto
C. Stachniss
SSeg
99
86
0
25 Feb 2018
A Walk with SGD
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
96
119
0
24 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to
  Adversaries
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
100
167
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in
  Neural Networks
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
Basel Alomair
207
1,151
0
22 Feb 2018
SparCML: High-Performance Sparse Communication for Machine Learning
SparCML: High-Performance Sparse Communication for Machine Learning
Cédric Renggli
Saleh Ashkboos
Mehdi Aghagolzadeh
Dan Alistarh
Torsten Hoefler
91
127
0
22 Feb 2018
3LC: Lightweight and Effective Traffic Compression for Distributed
  Machine Learning
3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning
Hyeontaek Lim
D. Andersen
M. Kaminsky
134
70
0
21 Feb 2018
Variance-based Gradient Compression for Efficient Distributed Deep
  Learning
Variance-based Gradient Compression for Efficient Distributed Deep Learning
Yusuke Tsuzuku
H. Imachi
Takuya Akiba
FedML
73
82
0
16 Feb 2018
Horovod: fast and easy distributed deep learning in TensorFlow
Horovod: fast and easy distributed deep learning in TensorFlow
Alexander Sergeev
Mike Del Balso
106
1,223
0
15 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning
A Progressive Batching L-BFGS Method for Machine Learning
Raghu Bollapragada
Dheevatsa Mudigere
J. Nocedal
Hao-Jun Michael Shi
P. T. P. Tang
ODL
114
153
0
15 Feb 2018
Exploring Hidden Dimensions in Parallelizing Convolutional Neural
  Networks
Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks
Zhihao Jia
Sina Lin
C. Qi
A. Aiken
101
117
0
14 Feb 2018
Tensor Comprehensions: Framework-Agnostic High-Performance Machine
  Learning Abstractions
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Nicolas Vasilache
O. Zinenko
Theodoros Theodoridis
Priya Goyal
Zach DeVito
William S. Moses
Sven Verdoolaege
Andrew Adams
Albert Cohen
126
438
0
13 Feb 2018
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex
  Optimization
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Zhize Li
Jian Li
105
116
0
13 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedMLODL
137
1,053
0
13 Feb 2018
Lipschitz-Margin Training: Scalable Certification of Perturbation
  Invariance for Deep Neural Networks
Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks
Yusuke Tsuzuku
Issei Sato
Masashi Sugiyama
AAML
117
309
0
12 Feb 2018
ShakeDrop Regularization for Deep Residual Learning
ShakeDrop Regularization for Deep Residual Learning
Yoshihiro Yamada
Masakazu Iwamura
Takuya Akiba
K. Kise
119
164
0
07 Feb 2018
Parameter Box: High Performance Parameter Servers for Efficient
  Distributed Deep Neural Network Training
Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training
Liangchen Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
61
1
0
30 Jan 2018
On Scale-out Deep Learning Training for Cloud and HPC
On Scale-out Deep Learning Training for Cloud and HPC
Srinivas Sridharan
K. Vaidyanathan
Dhiraj D. Kalamkar
Dipankar Das
Mikhail E. Smorkalov
...
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
Pradeep Dubey
BDL
70
30
0
24 Jan 2018
Bayesian Deep Convolutional Encoder-Decoder Networks for Surrogate
  Modeling and Uncertainty Quantification
Bayesian Deep Convolutional Encoder-Decoder Networks for Surrogate Modeling and Uncertainty Quantification
Yinhao Zhu
N. Zabaras
UQCVBDL
115
649
0
21 Jan 2018
Distributed Deep Reinforcement Learning: Learn how to play Atari games
  in 21 minutes
Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes
Igor Adamski
R. Adamski
T. Grel
Adam Jedrych
Kamil Kaczmarek
Henryk Michalewski
OffRL
121
37
0
09 Jan 2018
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
291
1,901
0
28 Dec 2017
HACS: Human Action Clips and Segments Dataset for Recognition and
  Temporal Localization
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
Hang Zhao
Antonio Torralba
Lorenzo Torresani
Zhicheng Yan
VLMAI4TS
87
29
0
26 Dec 2017
Block-diagonal Hessian-free Optimization for Training Neural Networks
Block-diagonal Hessian-free Optimization for Training Neural Networks
Huishuai Zhang
Caiming Xiong
James Bradbury
R. Socher
ODL
60
22
0
20 Dec 2017
Previous
123...39404142
Next