ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
Parallel Complexity of Forward and Backward Propagation
Parallel Complexity of Forward and Backward Propagation
Maxim Naumov
52
8
0
18 Dec 2017
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
117
291
0
18 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural
  Networks
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
95
84
0
12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural
  Networks
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
45
20
0
08 Dec 2017
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel
  Distributed Training
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
79
174
0
07 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
112
136
0
06 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for
  Distributed Training
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
231
1,413
0
05 Dec 2017
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
Chung-Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
...
Katya Gonina
Navdeep Jaitly
Yue Liu
J. Chorowski
M. Bacchiani
AI4TS
174
1,155
0
05 Dec 2017
A Closer Look at Spatiotemporal Convolutions for Action Recognition
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
258
3,042
0
30 Nov 2017
Non-local Neural Networks
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
366
8,940
0
21 Nov 2017
MegDet: A Large Mini-Batch Object Detector
MegDet: A Large Mini-Batch Object Detector
Chao Peng
Tete Xiao
Zeming Li
Yuning Jiang
Xiangyu Zhang
Kai Jia
Gang Yu
Jian Sun
ObjD
209
318
0
20 Nov 2017
BPGrad: Towards Global Optimality in Deep Learning via Branch and
  Pruning
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Ziming Zhang
Yuanwei Wu
Guanghui Wang
ODL
65
28
0
19 Nov 2017
Performance Modeling and Evaluation of Distributed Deep Learning
  Frameworks on GPUs
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Shaoshuai Shi
Qiang-qiang Wang
Xiaowen Chu
90
110
0
16 Nov 2017
AOGNets: Compositional Grammatical Architectures for Deep Learning
AOGNets: Compositional Grammatical Architectures for Deep Learning
Xilai Li
Xi Song
Tianfu Wu
72
26
0
15 Nov 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
85
463
0
13 Nov 2017
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15
  Minutes
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes
Takuya Akiba
Shuji Suzuki
Keisuke Fukuda
VLM
76
314
0
12 Nov 2017
Scale out for large minibatch SGD: Residual network training on
  ImageNet-1K with improved accuracy and reduced time to train
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
70
55
0
12 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed
  Systems
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
63
4
0
02 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
130
996
0
01 Nov 2017
ChainerMN: Scalable Distributed Deep Learning Framework
ChainerMN: Scalable Distributed Deep Learning Framework
Takuya Akiba
Keisuke Fukuda
Shuji Suzuki
AI4CEBDLGNN
65
60
0
31 Oct 2017
Stochastic gradient descent performs variational inference, converges to
  limit cycles for deep networks
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
104
304
0
30 Oct 2017
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
323
9,831
0
25 Oct 2017
Asynchronous Decentralized Parallel Stochastic Gradient Descent
Asynchronous Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian
Wei Zhang
Ce Zhang
Ji Liu
ODL
75
500
0
18 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
126
253
0
17 Oct 2017
Synkhronos: a Multi-GPU Theano Extension for Data Parallelism
Synkhronos: a Multi-GPU Theano Extension for Data Parallelism
Adam Stooke
Pieter Abbeel
SyDaGNN
24
0
0
11 Oct 2017
Slim-DP: A Light Communication Data Parallelism for DNN
Slim-DP: A Light Communication Data Parallelism for DNN
Shizhao Sun
Wei-neng Chen
Jiang Bian
Xiaoguang Liu
Tie-Yan Liu
24
0
0
27 Sep 2017
Stochastic Nonconvex Optimization with Large Minibatches
Stochastic Nonconvex Optimization with Large Minibatches
Weiran Wang
Nathan Srebro
96
26
0
25 Sep 2017
Deep Sparse Subspace Clustering
Deep Sparse Subspace Clustering
Xi Peng
Jiashi Feng
Shijie Xiao
Jiwen Lu
Zhang Yi
Shuicheng Yan
60
22
0
25 Sep 2017
Online Learning of a Memory for Learning Rates
Online Learning of a Memory for Learning Rates
Franziska Meier
Daniel Kappler
S. Schaal
62
21
0
20 Sep 2017
ImageNet Training in Minutes
ImageNet Training in Minutes
Yang You
Zhao-jie Zhang
Cho-Jui Hsieh
J. Demmel
Kurt Keutzer
VLMLRM
186
57
0
14 Sep 2017
What does fault tolerant Deep Learning need from MPI?
What does fault tolerant Deep Learning need from MPI?
Vinay C. Amatya
Abhinav Vishnu
Charles Siegel
J. Daily
74
19
0
11 Sep 2017
An Adaptive Sampling Scheme to Efficiently Train Fully Convolutional
  Networks for Semantic Segmentation
An Adaptive Sampling Scheme to Efficiently Train Fully Convolutional Networks for Semantic Segmentation
L. Berger
E. Hyde
M. Jorge Cardoso
Sebastien Ourselin
SSeg
95
41
0
08 Sep 2017
Simple Recurrent Units for Highly Parallelizable Recurrence
Simple Recurrent Units for Highly Parallelizable Recurrence
Tao Lei
Yu Zhang
Sida I. Wang
Huijing Dai
Yoav Artzi
LRM
165
277
0
08 Sep 2017
Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning
  Workloads
Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads
Tian Li
Jie Zhong
Ji Liu
Wentao Wu
Ce Zhang
54
70
0
24 Aug 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large
  Learning Rates
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
108
518
0
23 Aug 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
169
854
0
13 Aug 2017
Distributed Training Large-Scale Deep Architectures
Distributed Training Large-Scale Deep Architectures
Shang-Xuan Zou
Chun-Yen Chen
Jui-Lin Wu
Chun-Nan Chou
Chia-Chin Tsao
Kuan-Chieh Tung
Ting-Wei Lin
Cheng-Lung Sung
Edward Y. Chang
53
22
0
10 Aug 2017
Regularizing and Optimizing LSTM Language Models
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
178
1,098
0
07 Aug 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAMLODL
111
44
0
26 Jul 2017
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
Fartash Faghri
David J. Fleet
J. Kiros
Sanja Fidler
VLM
87
183
0
18 Jul 2017
Effective Approaches to Batch Parallelization for Dynamic Neural Network
  Architectures
Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures
Joseph Suárez
Clare Zhu
49
0
0
08 Jul 2017
Stochastic, Distributed and Federated Optimization for Machine Learning
Stochastic, Distributed and Federated Optimization for Machine Learning
Jakub Konecný
FedML
83
38
0
04 Jul 2017
Parle: parallelizing stochastic gradient descent
Parle: parallelizing stochastic gradient descent
Pratik Chaudhari
Carlo Baldassi
R. Zecchina
Stefano Soatto
Ameet Talwalkar
Adam M. Oberman
ODLFedML
85
21
0
03 Jul 2017
Training a Fully Convolutional Neural Network to Route Integrated
  Circuits
Training a Fully Convolutional Neural Network to Route Integrated Circuits
Sambhav R. Jain
Kye L. Okabe
SSL
22
8
0
27 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
89
11
0
18 Jun 2017
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
Levent Sagun
Utku Evci
V. U. Güney
Yann N. Dauphin
Léon Bottou
107
420
0
14 Jun 2017
Training Quantized Nets: A Deeper Understanding
Training Quantized Nets: A Deeper Understanding
Hao Li
Soham De
Zheng Xu
Christoph Studer
H. Samet
Tom Goldstein
MQ
87
211
0
07 Jun 2017
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
198
803
0
24 May 2017
Insensitive Stochastic Gradient Twin Support Vector Machine for Large
  Scale Problems
Insensitive Stochastic Gradient Twin Support Vector Machine for Large Scale Problems
Zhen Wang
Yifei Shao
Lan Bai
Li-Ming Liu
N. Deng
43
40
0
19 Apr 2017
Deep Relaxation: partial differential equations for optimizing deep
  neural networks
Deep Relaxation: partial differential equations for optimizing deep neural networks
Pratik Chaudhari
Adam M. Oberman
Stanley Osher
Stefano Soatto
G. Carlier
174
154
0
17 Apr 2017
Previous
123...404142
Next