ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
On the Utility of Gradient Compression in Distributed Training Systems
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
109
47
0
28 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
104
80
0
24 Feb 2021
On Interaction Between Augmentations and Corruptions in Natural
  Corruption Robustness
On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness
Eric Mintun
A. Kirillov
Saining Xie
104
98
0
22 Feb 2021
Contour Loss for Instance Segmentation via k-step Distance
  Transformation Image
Contour Loss for Instance Segmentation via k-step Distance Transformation Image
Xiaolong Guo
Xiaosong Lan
Kunfeng Wang
Shuxiao Li
ISeg
30
2
0
22 Feb 2021
Kanerva++: extending The Kanerva Machine with differentiable, locally
  block allocated latent memory
Kanerva++: extending The Kanerva Machine with differentiable, locally block allocated latent memory
Jason Ramapuram
Yan Wu
Alexandros Kalousis
42
4
0
20 Feb 2021
SWAD: Domain Generalization by Seeking Flat Minima
SWAD: Domain Generalization by Seeking Flat Minima
Junbum Cha
Sanghyuk Chun
Kyungjae Lee
Han-Cheol Cho
Seunghyun Park
Yunsung Lee
Sungrae Park
MoMe
318
463
0
17 Feb 2021
Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep
  Learning with Global View
Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View
Sheng-Jun Huang
40
0
0
17 Feb 2021
IntSGD: Adaptive Floatless Compression of Stochastic Gradients
IntSGD: Adaptive Floatless Compression of Stochastic Gradients
Konstantin Mishchenko
Bokun Wang
D. Kovalev
Peter Richtárik
107
15
0
16 Feb 2021
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
Wenjie Huang
Tom Goldstein
ODL
121
56
0
16 Feb 2021
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale
  Language Models
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Zhuohan Li
Siyuan Zhuang
Shiyuan Guo
Danyang Zhuo
Hao Zhang
Basel Alomair
Ion Stoica
MoE
104
125
0
16 Feb 2021
MARINA: Faster Non-Convex Distributed Learning with Compression
MARINA: Faster Non-Convex Distributed Learning with Compression
Eduard A. Gorbunov
Konstantin Burlachenko
Zhize Li
Peter Richtárik
111
110
0
15 Feb 2021
Learning by Turning: Neural Architecture Aware Optimisation
Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu
Jeremy Bernstein
M. Meister
Yisong Yue
ODL
127
26
0
14 Feb 2021
Learning Self-Similarity in Space and Time as Generalized Motion for
  Video Action Recognition
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Heeseung Kwon
Manjin Kim
Suha Kwak
Minsu Cho
TTA
79
42
0
14 Feb 2021
Self-Reorganizing and Rejuvenating CNNs for Increasing Model Capacity
  Utilization
Self-Reorganizing and Rejuvenating CNNs for Increasing Model Capacity Utilization
Wissam J. Baddar
Seungju Han
Seon-Min Rhee
Jae-Joon Han
42
1
0
13 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup
  Workers
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Guojun Xiong
Gang Yan
Rahul Singh
Jian Li
65
13
0
11 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
331
525
0
11 Feb 2021
Freudian and Newtonian Recurrent Cell for Sequential Recommendation
Freudian and Newtonian Recurrent Cell for Sequential Recommendation
Hoyeop Lee
Jinbae Im
Chang Ouk Kim
Sehee Chung
21
0
0
11 Feb 2021
Strength of Minibatch Noise in SGD
Strength of Minibatch Noise in SGD
Liu Ziyin
Kangqiao Liu
Takashi Mori
Masakuni Ueda
ODLMLT
64
35
0
10 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
438
2,080
0
09 Feb 2021
Consensus Control for Decentralized Deep Learning
Consensus Control for Decentralized Deep Learning
Lingjing Kong
Tao R. Lin
Anastasia Koloskova
Martin Jaggi
Sebastian U. Stich
53
80
0
09 Feb 2021
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
  Heterogeneous Data
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
Tao R. Lin
Sai Praneeth Karimireddy
Sebastian U. Stich
Martin Jaggi
FedML
107
101
0
09 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
86
24
0
09 Feb 2021
Deep Residual Learning in Spiking Neural Networks
Deep Residual Learning in Spiking Neural Networks
Wei Fang
Zhaofei Yu
Yanqing Chen
Tiejun Huang
T. Masquelier
Yonghong Tian
244
505
0
08 Feb 2021
Extracting the Auditory Attention in a Dual-Speaker Scenario from EEG
  using a Joint CNN-LSTM Model
Extracting the Auditory Attention in a Dual-Speaker Scenario from EEG using a Joint CNN-LSTM Model
Ivine Kuruvila
J. Muncke
Eghart Fischer
U. Hoppe
42
25
0
08 Feb 2021
Bias-Variance Reduced Local SGD for Less Heterogeneous Federated
  Learning
Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning
Tomoya Murata
Taiji Suzuki
FedML
91
52
0
05 Feb 2021
Truly Sparse Neural Networks at Scale
Truly Sparse Neural Networks at Scale
Selima Curci
Decebal Constantin Mocanu
Mykola Pechenizkiy
141
22
0
02 Feb 2021
FEDZIP: A Compression Framework for Communication-Efficient Federated
  Learning
FEDZIP: A Compression Framework for Communication-Efficient Federated Learning
Amirhossein Malekijoo
Mohammad Javad Fadaeieslam
Hanieh Malekijou
Morteza Homayounfar
F. Alizadeh-Shabdiz
Reza Rawassizadeh
FedML
94
55
0
02 Feb 2021
SA-Net: Shuffle Attention for Deep Convolutional Neural Networks
SA-Net: Shuffle Attention for Deep Convolutional Neural Networks
Qing-Long Zhang
AI4TS
61
550
0
30 Jan 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
67
205
0
28 Jan 2021
Lightweight Multi-Branch Network for Person Re-Identification
Lightweight Multi-Branch Network for Person Re-Identification
Fabian Herzog
Xunbo Ji
Torben Teepe
S. Hörmann
Johannes Gilg
Gerhard Rigoll
3DPC
58
54
0
26 Jan 2021
An Efficient Statistical-based Gradient Compression Technique for
  Distributed Training Systems
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
A. Abdelmoniem
Ahmed Elzanaty
Mohamed-Slim Alouini
Marco Canini
132
77
0
26 Jan 2021
Learning degraded image classification with restoration data fidelity
Learning degraded image classification with restoration data fidelity
Xiaoyu Lin
33
2
0
23 Jan 2021
Towards Enhancing Fine-grained Details for Image Matting
Towards Enhancing Fine-grained Details for Image Matting
Chang-rui Liu
Henghui Ding
Xudong Jiang
57
21
0
22 Jan 2021
Time-Correlated Sparsification for Communication-Efficient Federated
  Learning
Time-Correlated Sparsification for Communication-Efficient Federated Learning
Emre Ozfatura
Kerem Ozfatura
Deniz Gunduz
FedML
89
49
0
21 Jan 2021
Clairvoyant Prefetching for Distributed Machine Learning I/O
Clairvoyant Prefetching for Distributed Machine Learning I/O
Nikoli Dryden
Roman Böhringer
Tal Ben-Nun
Torsten Hoefler
79
59
0
21 Jan 2021
Self-Adaptive Training: Bridging Supervised and Self-Supervised Learning
Self-Adaptive Training: Bridging Supervised and Self-Supervised Learning
Lang Huang
Chaoning Zhang
Hongyang R. Zhang
SSL
101
25
0
21 Jan 2021
Characterizing signal propagation to close the performance gap in
  unnormalized ResNets
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Andrew Brock
Soham De
Samuel L. Smith
199
124
0
21 Jan 2021
Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for
  Self-Supervised Learning
Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning
Zeming Li
Songtao Liu
Jian Sun
113
16
0
19 Jan 2021
Towards Energy Efficient Federated Learning over 5G+ Mobile Devices
Towards Energy Efficient Federated Learning over 5G+ Mobile Devices
Dian Shi
Liang Li
Rui Chen
Pavana Prakash
Miao Pan
Yuguang Fang
93
45
0
13 Jan 2021
SparsePipe: Parallel Deep Learning for 3D Point Clouds
SparsePipe: Parallel Deep Learning for 3D Point Clouds
Keke Zhai
Pan He
Tania Banerjee-Mishra
Anand Rangarajan
Sanjay Ranka
3DPC
16
1
0
27 Dec 2020
Balance-Oriented Focal Loss with Linear Scheduling for Anchor Free
  Object Detection
Balance-Oriented Focal Loss with Linear Scheduling for Anchor Free Object Detection
Hopyong Gil
Sangwook Park
Yusang Park
Wongoo Han
Juyean Hong
Juneyoung Jung
ObjD
58
1
0
26 Dec 2020
Variance Reduction on General Adaptive Stochastic Mirror Descent
Variance Reduction on General Adaptive Stochastic Mirror Descent
Wenjie Li
Zhanyu Wang
Yichen Zhang
Guang Cheng
73
4
0
26 Dec 2020
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
94
32
0
24 Dec 2020
AsymptoticNG: A regularized natural gradient optimization algorithm with
  look-ahead strategy
AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy
Zedong Tang
Fenlong Jiang
Junke Song
Maoguo Gong
Hao Li
F. Yu
Zidong Wang
Min Wang
ODL
36
1
0
24 Dec 2020
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
409
6,858
0
23 Dec 2020
BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training
BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training
Letian Zhao
Rui Xu
Tianqi Wang
Teng Tian
Xiaotian Wang
Wei Wu
Chio-in Ieong
Xi Jin
MoE
43
8
0
23 Dec 2020
Deep Unsupervised Image Hashing by Maximizing Bit Entropy
Deep Unsupervised Image Hashing by Maximizing Bit Entropy
Yun-qiang Li
Jan van Gemert
41
93
0
22 Dec 2020
FcaNet: Frequency Channel Attention Networks
FcaNet: Frequency Channel Attention Networks
Zequn Qin
Pengyi Zhang
Leilei Gan
Xi Li
122
715
0
22 Dec 2020
To Talk or to Work: Flexible Communication Compression for Energy
  Efficient Federated Learning over Heterogeneous Mobile Edge Devices
To Talk or to Work: Flexible Communication Compression for Energy Efficient Federated Learning over Heterogeneous Mobile Edge Devices
Liang Li
Dian Shi
Ronghui Hou
Hui Li
Miao Pan
Zhu Han
FedML
70
152
0
22 Dec 2020
Regularization in network optimization via trimmed stochastic gradient
  descent with noisy label
Regularization in network optimization via trimmed stochastic gradient descent with noisy label
Kensuke Nakamura
Bong-Soo Sohn
Kyoung-Jae Won
Byung-Woo Hong
NoLa
60
0
0
21 Dec 2020
Previous
123...232425...404142
Next