ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
Distributed Reinforcement Learning of Targeted Grasping with Active
  Vision for Mobile Manipulators
Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators
Yasuhiro Fujita
Kota Uenishi
Avinash Ummadisingu
P. Nagarajan
Shimpei Masuda
M. Castro
84
18
0
16 Jul 2020
Gradient-based Hyperparameter Optimization Over Long Horizons
Gradient-based Hyperparameter Optimization Over Long Horizons
P. Micaelli
Amos Storkey
123
15
0
15 Jul 2020
FetchSGD: Communication-Efficient Federated Learning with Sketching
FetchSGD: Communication-Efficient Federated Learning with Sketching
D. Rothchild
Ashwinee Panda
Enayat Ullah
Nikita Ivkin
Ion Stoica
Vladimir Braverman
Joseph E. Gonzalez
Raman Arora
FedML
100
373
0
15 Jul 2020
Long-tail learning via logit adjustment
Long-tail learning via logit adjustment
A. Menon
Sadeep Jayasumana
A. S. Rawat
Himanshu Jain
Andreas Veit
Sanjiv Kumar
132
715
0
14 Jul 2020
Analyzing and Mitigating Data Stalls in DNN Training
Analyzing and Mitigating Data Stalls in DNN Training
Jayashree Mohan
Amar Phanishayee
Ashish Raniwala
Vijay Chidambaram
86
110
0
14 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing
  Communication in Distributed Learning
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng Jiang
G. Agrawal
54
5
0
13 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
90
37
0
09 Jul 2020
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
Hongyi Wang
Kartik K. Sreenivasan
Shashank Rajput
Harit Vishwakarma
Saurabh Agarwal
Jy-yong Sohn
Kangwook Lee
Dimitris Papailiopoulos
FedML
121
617
0
09 Jul 2020
Training Sound Event Detection On A Heterogeneous Dataset
Training Sound Event Detection On A Heterogeneous Dataset
Nicolas Turpault
Romain Serizel
82
61
0
08 Jul 2020
Fast Training of Deep Neural Networks Robust to Adversarial
  Perturbations
Fast Training of Deep Neural Networks Robust to Adversarial Perturbations
Justin A. Goodwin
Olivia M. Brown
Victoria Helus
OODAAML
27
3
0
08 Jul 2020
Discretization-Aware Architecture Search
Discretization-Aware Architecture Search
Yunjie Tian
Chang-rui Liu
Lingxi Xie
Jianbin Jiao
QiXiang Ye
67
32
0
07 Jul 2020
FracBits: Mixed Precision Quantization via Fractional Bit-Widths
FracBits: Mixed Precision Quantization via Fractional Bit-Widths
Linjie Yang
Qing Jin
MQ
101
74
0
04 Jul 2020
Descending through a Crowded Valley - Benchmarking Deep Learning
  Optimizers
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
227
169
0
03 Jul 2020
Variance reduction for Riemannian non-convex optimization with batch
  size adaptation
Variance reduction for Riemannian non-convex optimization with batch size adaptation
Andi Han
Junbin Gao
85
5
0
03 Jul 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan
Yi Rong
Chen Meng
Zongyan Cao
Siyu Wang
...
Jun Yang
Lixue Xia
Lansong Diao
Xiaoyong Liu
Wei Lin
98
242
0
02 Jul 2020
On the Outsized Importance of Learning Rates in Local Update Methods
On the Outsized Importance of Learning Rates in Local Update Methods
Zachary B. Charles
Jakub Konecný
FedML
94
54
0
02 Jul 2020
Group Ensemble: Learning an Ensemble of ConvNets in a single ConvNet
Group Ensemble: Learning an Ensemble of ConvNets in a single ConvNet
Hao Chen
Abhinav Shrivastava
57
14
0
01 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
144
135
0
30 Jun 2020
Deep Isometric Learning for Visual Recognition
Deep Isometric Learning for Visual Recognition
Haozhi Qi
Chong You
Xinyu Wang
Yi-An Ma
Jitendra Malik
VLM
102
55
0
30 Jun 2020
Vehicle Attribute Recognition by Appearance: Computer Vision Methods for
  Vehicle Type, Make and Model Classification
Vehicle Attribute Recognition by Appearance: Computer Vision Methods for Vehicle Type, Make and Model Classification
Xingyang Ni
H. Huttunen
CVBM
38
20
0
29 Jun 2020
Is SGD a Bayesian sampler? Well, almost
Is SGD a Bayesian sampler? Well, almost
Chris Mingard
Guillermo Valle Pérez
Joar Skalse
A. Louis
BDL
81
53
0
26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
62
100
0
26 Jun 2020
Object-Centric Learning with Slot Attention
Object-Centric Learning with Slot Attention
Francesco Locatello
Dirk Weissenborn
Thomas Unterthiner
Aravindh Mahendran
G. Heigold
Jakob Uszkoreit
Alexey Dosovitskiy
Thomas Kipf
OCL
243
859
0
26 Jun 2020
Influence Functions in Deep Learning Are Fragile
Influence Functions in Deep Learning Are Fragile
S. Basu
Phillip E. Pope
Soheil Feizi
TDI
156
237
0
25 Jun 2020
Time-varying Graph Representation Learning via Higher-Order Skip-Gram
  with Negative Sampling
Time-varying Graph Representation Learning via Higher-Order Skip-Gram with Negative Sampling
Simone Piaggesi
Andre' Panisson
47
2
0
25 Jun 2020
Effective Elastic Scaling of Deep Learning Workloads
Effective Elastic Scaling of Deep Learning Workloads
Vaibhav Saxena
K.R. Jayaram
Saurav Basu
Yogish Sabharwal
Ashish Verma
57
9
0
24 Jun 2020
Labelling unlabelled videos from scratch with multi-modal
  self-supervision
Labelling unlabelled videos from scratch with multi-modal self-supervision
Yuki M. Asano
Mandela Patrick
Christian Rupprecht
Andrea Vedaldi
SSL
129
152
0
24 Jun 2020
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
Shuai Zheng
Yanghua Peng
Sheng Zha
Mu Li
ODL
72
21
0
24 Jun 2020
iffDetector: Inference-aware Feature Filtering for Object Detection
iffDetector: Inference-aware Feature Filtering for Object Detection
Mingyuan Mao
Yuxin Tian
Baochang Zhang
QiXiang Ye
Wanquan Liu
Guodong Guo
David Doermann
ObjD
44
11
0
23 Jun 2020
Microstructure Generation via Generative Adversarial Network for
  Heterogeneous, Topologically Complex 3D Materials
Microstructure Generation via Generative Adversarial Network for Heterogeneous, Topologically Complex 3D Materials
Tim Hsu
W. Epting
Hokon Kim
H. Abernathy
Gregory A. Hackett
A. Rollett
P. Salvador
Elizabeth A. Holm
35
84
0
22 Jun 2020
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage
  Trees
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees
Ahnjae Shin
Do Yoon Kim
Joo Seong Jeong
Byung-Gon Chun
52
4
0
22 Jun 2020
Deep Polynomial Neural Networks
Deep Polynomial Neural Networks
Grigorios G. Chrysos
Stylianos Moschoglou
Giorgos Bouritsas
Jiankang Deng
Yannis Panagakis
Stefanos Zafeiriou
91
94
0
20 Jun 2020
How do SGD hyperparameters in natural training affect adversarial
  robustness?
How do SGD hyperparameters in natural training affect adversarial robustness?
Sandesh Kamath
Amit Deshpande
K. Subrahmanyam
AAML
41
3
0
20 Jun 2020
Pyramidal Convolution: Rethinking Convolutional Neural Networks for
  Visual Recognition
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition
Ionut Cosmin Duta
Li Liu
Fan Zhu
Ling Shao
70
198
0
20 Jun 2020
DEED: A General Quantization Scheme for Communication Efficiency in Bits
DEED: A General Quantization Scheme for Communication Efficiency in Bits
Tian-Chun Ye
Peijun Xiao
Ruoyu Sun
FedMLMQ
40
2
0
19 Jun 2020
Tent: Fully Test-time Adaptation by Entropy Minimization
Tent: Fully Test-time Adaptation by Entropy Minimization
Dequan Wang
Evan Shelhamer
Shaoteng Liu
Bruno A. Olshausen
Trevor Darrell
OOD
140
53
0
18 Jun 2020
Cyclic Differentiable Architecture Search
Cyclic Differentiable Architecture Search
Hongyuan Yu
Houwen Peng
Yan Huang
Jianlong Fu
Hao Du
Liang Wang
Haibin Ling
3DPC
123
48
0
18 Jun 2020
Unsupervised Learning of Visual Features by Contrasting Cluster
  Assignments
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron
Ishan Misra
Julien Mairal
Priya Goyal
Piotr Bojanowski
Armand Joulin
OCLSSL
352
4,113
0
17 Jun 2020
Enhance Curvature Information by Structured Stochastic Quasi-Newton
  Methods
Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods
Minghan Yang
Dong Xu
Yongfeng Li
Zaiwen Wen
Mengyun Chen
ODL
47
3
0
17 Jun 2020
Improving accuracy and speeding up Document Image Classification through
  parallel systems
Improving accuracy and speeding up Document Image Classification through parallel systems
Javier Ferrando
J. L. Domínguez
Jordi Torres
Raul Garcia
David García
Daniel Garrido
J. Cortada
M. Valero
119
26
0
16 Jun 2020
1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge
  2020
1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020
Siyu Chen
Junting Pan
Guanglu Song
Manyuan Zhang
Hao Shao
Ziyi Lin
Jing Shao
Hongsheng Li
Yu Liu
3DPC
51
4
0
16 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory
  Approach to Neural Network Training
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
148
50
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
216
95
0
15 Jun 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
Depth Uncertainty in Neural Networks
Depth Uncertainty in Neural Networks
Javier Antorán
J. Allingham
José Miguel Hernández-Lobato
UQCVOODBDL
119
103
0
15 Jun 2020
Spherical Motion Dynamics: Learning Dynamics of Neural Network with
  Normalization, Weight Decay, and SGD
Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD
Ruosi Wan
Zhanxing Zhu
Xiangyu Zhang
Jian Sun
78
11
0
15 Jun 2020
Actor-Context-Actor Relation Network for Spatio-Temporal Action
  Localization
Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization
Junting Pan
Siyu Chen
Zheng Shou
Yu Liu
Jing Shao
Hongsheng Li
3DPC
112
151
0
14 Jun 2020
Bootstrap your own latent: A new approach to self-supervised Learning
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
494
6,878
0
13 Jun 2020
Adversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive Learning
Minseon Kim
Jihoon Tack
Sung Ju Hwang
SSL
99
251
0
13 Jun 2020
O(1) Communication for Distributed SGD through Two-Level Gradient
  Averaging
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging
Subhadeep Bhattacharya
Weikuan Yu
Fahim Chowdhury
FedML
26
2
0
12 Jun 2020
Previous
123...272829...404142
Next