ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
129
76
0
27 Jan 2019
Fixup Initialization: Residual Learning Without Normalization
Fixup Initialization: Residual Learning Without Normalization
Hongyi Zhang
Yann N. Dauphin
Tengyu Ma
ODLAI4CE
102
351
0
27 Jan 2019
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model
  Reconfiguration
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration
Sangkug Lym
Esha Choukse
Siavash Zangeneh
W. Wen
Sujay Sanghavi
M. Erez
CVBM
81
88
0
26 Jan 2019
DistInit: Learning Video Representations Without a Single Labeled Video
DistInit: Learning Video Representations Without a Single Labeled Video
Rohit Girdhar
Du Tran
Lorenzo Torresani
Deva Ramanan
68
54
0
26 Jan 2019
Revisiting Self-Supervised Visual Representation Learning
Revisiting Self-Supervised Visual Representation Learning
Alexander Kolesnikov
Xiaohua Zhai
Lucas Beyer
SSL
208
717
0
25 Jan 2019
Traditional and Heavy-Tailed Self Regularization in Neural Network
  Models
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
Charles H. Martin
Michael W. Mahoney
98
126
0
24 Jan 2019
Large-Batch Training for LSTM and Beyond
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
65
91
0
24 Jan 2019
Trajectory Normalized Gradients for Distributed Optimization
Trajectory Normalized Gradients for Distributed Optimization
Jianqiao Wangni
Ke Li
Jianbo Shi
Jitendra Malik
47
2
0
24 Jan 2019
Fully Asynchronous Distributed Optimization with Linear Convergence in
  Directed Networks
Fully Asynchronous Distributed Optimization with Linear Convergence in Directed Networks
Jiaqi Zhang
Keyou You
80
17
0
24 Jan 2019
Decoupled Greedy Learning of CNNs
Decoupled Greedy Learning of CNNs
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
80
117
0
23 Jan 2019
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep
  Neural Networks
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Jinrong Guo
Wantao Liu
Wang Wang
Q. Lu
Songlin Hu
Jizhong Han
Ruixuan Li
62
9
0
21 Jan 2019
Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep
  Networks
Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks
Charbel Sakr
Naigang Wang
Chia-Yu Chen
Jungwook Choi
A. Agrawal
Naresh R Shanbhag
K. Gopalakrishnan
MQ
73
34
0
19 Jan 2019
Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training
  Workloads
Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
Myeongjae Jeon
Shivaram Venkataraman
Amar Phanishayee
Junjie Qian
Wencong Xiao
Fan Yang
GNN
91
358
0
17 Jan 2019
Class-Balanced Loss Based on Effective Number of Samples
Class-Balanced Loss Based on Effective Number of Samples
Huayu Chen
Menglin Jia
Nayeon Lee
Yang Song
Serge J. Belongie
202
2,297
0
16 Jan 2019
A Distributed Synchronous SGD Algorithm with Global Top-$k$
  Sparsification for Low Bandwidth Networks
A Distributed Synchronous SGD Algorithm with Global Top-kkk Sparsification for Low Bandwidth Networks
Shaoshuai Shi
Qiang-qiang Wang
Kaiyong Zhao
Zhenheng Tang
Yuxin Wang
Xiang Huang
Xiaowen Chu
90
137
0
14 Jan 2019
FishNet: A Versatile Backbone for Image, Region, and Pixel Level
  Prediction
FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction
Shuyang Sun
Jiangmiao Pang
Jianping Shi
Shuai Yi
Wanli Ouyang
117
101
0
11 Jan 2019
RetinaMask: Learning to predict masks improves state-of-the-art
  single-shot detection for free
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
Cheng-Yang Fu
Mykhailo Shvets
Alexander C. Berg
ObjD
101
141
0
10 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Kai Zou
Paolo Costa
Peter R. Pietzuch
65
70
0
08 Jan 2019
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
Linghao Song
Jiachen Mao
Youwei Zhuo
Xuehai Qian
Hai Helen Li
Yiran Chen
90
98
0
07 Jan 2019
Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual
  Representation Learning
Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning
Baoyuan Wu
Weidong Chen
Yanbo Fan
Yong Zhang
Jinlong Hou
Jie Liu
Tong Zhang
VLMMLLM
99
87
0
07 Jan 2019
FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA
  Clusters
FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters
Tong Geng
Tianqi Wang
Ang Li
Xi Jin
Martin C. Herbordt
FedMLGNN
23
8
0
04 Jan 2019
Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm
Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm
Charbel Sakr
Naresh R Shanbhag
MQ
84
43
0
31 Dec 2018
Exploring Weight Symmetry in Deep Neural Networks
Exploring Weight Symmetry in Deep Neural Networks
S. Hu
Sergey Zagoruyko
N. Komodakis
51
33
0
28 Dec 2018
Stanza: Layer Separation for Distributed Training in Deep Learning
Stanza: Layer Separation for Distributed Training in Deep Learning
Xiaorui Wu
Hongao Xu
Bo Li
Y. Xiong
MoE
61
9
0
27 Dec 2018
An Empirical Model of Large-Batch Training
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
76
280
0
14 Dec 2018
Long-Term Feature Banks for Detailed Video Understanding
Long-Term Feature Banks for Detailed Video Understanding
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
236
481
0
12 Dec 2018
On the Ineffectiveness of Variance Reduced Optimization for Deep
  Learning
On the Ineffectiveness of Variance Reduced Optimization for Deep Learning
Aaron Defazio
Léon Bottou
UQCVDRL
93
113
0
11 Dec 2018
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
201
3,300
0
10 Dec 2018
Feature Denoising for Improving Adversarial Robustness
Feature Denoising for Improving Adversarial Robustness
Cihang Xie
Yuxin Wu
Laurens van der Maaten
Alan Yuille
Kaiming He
172
916
0
09 Dec 2018
Three Tools for Practical Differential Privacy
Three Tools for Practical Differential Privacy
K. V. D. Veen
Ruben Seggers
Peter Bloem
Giorgio Patrini
70
39
0
07 Dec 2018
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN
  Training
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training
Saurabh N. Adya
Vinay Palakkode
Oncel Tuzel
39
4
0
07 Dec 2018
Video Action Transformer Network
Video Action Transformer Network
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
ViT
183
709
0
06 Dec 2018
JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of
  Imperative Programs
JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs
Eunji Jeong
Sungwoo Cho
Gyeong-In Yu
Joo Seong Jeong
Dongjin Shin
Byung-Gon Chun
59
25
0
04 Dec 2018
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
300
1,421
0
04 Dec 2018
Pre-Defined Sparse Neural Networks with Hardware Acceleration
Pre-Defined Sparse Neural Networks with Hardware Acceleration
Sourya Dey
Kuan-Wen Huang
Peter A. Beerel
K. Chugg
109
25
0
04 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic
  Gradient Descent
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
97
73
0
30 Nov 2018
Graph-Based Global Reasoning Networks
Graph-Based Global Reasoning Networks
Yunpeng Chen
Marcus Rohrbach
Zhicheng Yan
Shuicheng Yan
Jiashi Feng
Yannis Kalantidis
GNNNAI
310
460
0
30 Nov 2018
Parsing R-CNN for Instance-Level Human Analysis
Parsing R-CNN for Instance-Level Human Analysis
Lu Yang
Q. Song
Zhihui Wang
Ming Jiang
SSeg
126
123
0
30 Nov 2018
Data-parallel distributed training of very large models beyond GPU
  capacity
Data-parallel distributed training of very large models beyond GPU capacity
Samuel Matzek
M. Grossman
Minsik Cho
Anar Yusifov
Bryant Nelson
A. Juneja
GNN
34
3
0
29 Nov 2018
Efficient Coarse-to-Fine Non-Local Module for the Detection of Small
  Objects
Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects
Hila Levi
S. Ullman
ObjD
78
14
0
29 Nov 2018
Grid R-CNN
Grid R-CNN
Xin Lu
Buyu Li
Yuxin Yue
Quanquan Li
Junjie Yan
ObjD
61
384
0
29 Nov 2018
Large-Scale Distributed Second-Order Optimization Using
  Kronecker-Factored Approximate Curvature for Deep Convolutional Neural
  Networks
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
ODL
120
95
0
29 Nov 2018
Deep learning for pedestrians: backpropagation in CNNs
Deep learning for pedestrians: backpropagation in CNNs
L. Boué
3DVPINN
39
4
0
29 Nov 2018
Spectral Feature Transformation for Person Re-identification
Spectral Feature Transformation for Person Re-identification
Chuanchen Luo
Yuntao Chen
Naiyan Wang
Zhaoxiang Zhang
111
124
0
28 Nov 2018
Self-Supervised GANs via Auxiliary Rotation Loss
Self-Supervised GANs via Auxiliary Rotation Loss
Ting Chen
Xiaohua Zhai
Marvin Ritter
Mario Lucic
N. Houlsby
SSLGAN
93
302
0
27 Nov 2018
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD
  Algorithms
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms
Shaoshuai Shi
Xiaowen Chu
Bo Li
FedML
91
90
0
27 Nov 2018
Stochastic Gradient Push for Distributed Deep Learning
Stochastic Gradient Push for Distributed Deep Learning
Mahmoud Assran
Nicolas Loizou
Nicolas Ballas
Michael G. Rabbat
125
347
0
27 Nov 2018
Deep Learning Inference in Facebook Data Centers: Characterization,
  Performance Optimizations and Hardware Implications
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Jongsoo Park
Maxim Naumov
Protonu Basu
Summer Deng
Aravind Kalaiah
...
Lin Qiao
Vijay Rao
Nadav Rotem
S. Yoo
M. Smelyanskiy
FedMLGNNBDL
93
187
0
24 Nov 2018
Hydra: A Peer to Peer Distributed Training & Data Collection Framework
Hydra: A Peer to Peer Distributed Training & Data Collection Framework
Vaibhav Mathur
K. Chahal
OffRL
35
2
0
24 Nov 2018
MURAUER: Mapping Unlabeled Real Data for Label AUstERity
MURAUER: Mapping Unlabeled Real Data for Label AUstERity
Georg Poier
M. Opitz
David Schinagl
Horst Bischof
3DH
62
18
0
23 Nov 2018
Previous
123...363738...404142
Next