ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
Hierarchical Opacity Propagation for Image Matting
Hierarchical Opacity Propagation for Image Matting
Yaoyi Li
Qin Xu
Hongtao Lu
71
13
0
07 Apr 2020
Evolving Normalization-Activation Layers
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
117
81
0
06 Apr 2020
TEA: Temporal Excitation and Aggregation for Action Recognition
TEA: Temporal Excitation and Aggregation for Action Recognition
Yan-Ran Li
Bin Ji
Xintian Shi
Jianguo Zhang
Bin Kang
Limin Wang
ViT
102
450
0
03 Apr 2020
Controllable Orthogonalization in Training DNNs
Controllable Orthogonalization in Training DNNs
Lei Huang
Li Liu
Fan Zhu
Diwen Wan
Zehuan Yuan
Bo Li
Ling Shao
90
44
0
02 Apr 2020
M2m: Imbalanced Classification via Major-to-minor Translation
M2m: Imbalanced Classification via Major-to-minor Translation
Jaehyung Kim
Jongheon Jeong
Jinwoo Shin
107
225
0
01 Apr 2020
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
  Recognition
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
Ziyu Liu
Hongwen Zhang
Zhenghao Chen
Zhiyong Wang
Wanli Ouyang
150
837
0
31 Mar 2020
MUXConv: Information Multiplexing in Convolutional Neural Networks
MUXConv: Information Multiplexing in Convolutional Neural Networks
Zhichao Lu
Kalyanmoy Deb
Vishnu Boddeti
50
46
0
31 Mar 2020
Designing Network Design Spaces
Designing Network Design Spaces
Ilija Radosavovic
Raj Prateek Kosaraju
Ross B. Girshick
Kaiming He
Piotr Dollár
GNN
158
1,703
0
30 Mar 2020
Stochastic Proximal Gradient Algorithm with Minibatches. Application to
  Large Scale Learning Models
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models
A. Pătraşcu
C. Paduraru
Paul Irofti
43
0
0
30 Mar 2020
Understanding the Effects of Data Parallelism and Sparsity on Neural
  Network Training
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
Martin Jaggi
45
2
0
25 Mar 2020
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage
  Models
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Jiahui Yu
Pengchong Jin
Hanxiao Liu
Gabriel Bender
Pieter-Jan Kindermans
Mingxing Tan
Thomas Huang
Xiaodan Song
Ruoming Pang
Quoc V. Le
113
304
0
24 Mar 2020
Robust and On-the-fly Dataset Denoising for Image Classification
Robust and On-the-fly Dataset Denoising for Image Classification
Jiaming Song
Lunjia Hu
Michael Auli
Yann N. Dauphin
Tengyu Ma
NoLaOOD
90
13
0
24 Mar 2020
HierTrain: Fast Hierarchical Edge AI Learning with Hybrid Parallelism in
  Mobile-Edge-Cloud Computing
HierTrain: Fast Hierarchical Edge AI Learning with Hybrid Parallelism in Mobile-Edge-Cloud Computing
Deyin Liu
Xu Chen
Zhi Zhou
Qing Ling
95
46
0
22 Mar 2020
BS-NAS: Broadening-and-Shrinking One-Shot NAS with Searchable Numbers of
  Channels
BS-NAS: Broadening-and-Shrinking One-Shot NAS with Searchable Numbers of Channels
Zan Shen
Jiang Qian
Bojin Zhuang
Shaojun Wang
Jing Xiao
84
5
0
22 Mar 2020
The Future of Digital Health with Federated Learning
The Future of Digital Health with Federated Learning
Nicola Rieke
Jonny Hancox
Wenqi Li
Fausto Milletari
H. Roth
...
Ronald M. Summers
Andrew Trask
Daguang Xu
Maximilian Baust
M. Jorge Cardoso
OOD
284
1,811
0
18 Mar 2020
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang
Yukun Zhu
Bradley Green
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
3DPC
136
676
0
17 Mar 2020
The Implicit Regularization of Stochastic Gradient Flow for Least
  Squares
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali
Yan Sun
Robert Tibshirani
103
77
0
17 Mar 2020
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
Guanglu Song
Yu Liu
Xiaogang Wang
ObjD
257
352
0
17 Mar 2020
Deep Affinity Net: Instance Segmentation via Affinity
Deep Affinity Net: Instance Segmentation via Affinity
Xingqian Xu
M. Chiu
Thomas S. Huang
Humphrey Shi
ISegSSeg
82
11
0
15 Mar 2020
Top-1 Solution of Multi-Moments in Time Challenge 2019
Top-1 Solution of Multi-Moments in Time Challenge 2019
Manyuan Zhang
Hao Shao
Guanglu Song
Yu Liu
Junjie Yan
40
3
0
12 Mar 2020
Extended Batch Normalization
Extended Batch Normalization
Chunjie Luo
Jianfeng Zhan
Lei Wang
Wanling Gao
134
14
0
12 Mar 2020
Equalization Loss for Long-Tailed Object Recognition
Equalization Loss for Long-Tailed Object Recognition
Jingru Tan
Changbao Wang
Buyu Li
Quanquan Li
Wanli Ouyang
Changqing Yin
Junjie Yan
329
466
0
11 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
89
283
0
10 Mar 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive
  Survey
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Zhenheng Tang
Shaoshuai Shi
Wei Wang
Yue Liu
Xiaowen Chu
83
49
0
10 Mar 2020
On Compositions of Transformations in Contrastive Self-Supervised
  Learning
On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
89
49
0
09 Mar 2020
IROF: a low resource evaluation metric for explanation methods
IROF: a low resource evaluation metric for explanation methods
Laura Rieger
Lars Kai Hansen
68
55
0
09 Mar 2020
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate
  Schedule
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
89
29
0
09 Mar 2020
$Π-$nets: Deep Polynomial Neural Networks
Π−Π-Π−nets: Deep Polynomial Neural Networks
Grigorios G. Chrysos
Stylianos Moschoglou
Giorgos Bouritsas
Yannis Panagakis
Jiankang Deng
Stefanos Zafeiriou
85
61
0
08 Mar 2020
CPM R-CNN: Calibrating Point-guided Misalignment in Object Detection
CPM R-CNN: Calibrating Point-guided Misalignment in Object Detection
Bin Zhu
Q. Song
Lu Yang
Zhihui Wang
Chun Liu
Mengjie Hu
3DPCObjD
138
15
0
07 Mar 2020
ShadowSync: Performing Synchronization in the Background for Highly
  Scalable Distributed Training
ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training
Qinqing Zheng
Bor-Yiing Su
Jiyan Yang
A. Azzolini
Qiang Wu
Ou Jin
S. Karandikar
Hagay Lupesko
Liang Xiong
Eric Zhou
3DHFedMLGNN
73
8
0
07 Mar 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
123
12
0
06 Mar 2020
Adaptive Federated Optimization
Adaptive Federated Optimization
Sashank J. Reddi
Zachary B. Charles
Manzil Zaheer
Zachary Garrett
Keith Rush
Jakub Konecný
Sanjiv Kumar
H. B. McMahan
FedML
225
1,462
0
29 Feb 2020
Automatic Perturbation Analysis for Scalable Certified Robustness and
  Beyond
Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
Kaidi Xu
Zhouxing Shi
Huan Zhang
Yihan Wang
Kai-Wei Chang
Minlie Huang
B. Kailkhura
Xinyu Lin
Cho-Jui Hsieh
AAML
64
12
0
28 Feb 2020
Learning Representations by Predicting Bags of Visual Words
Learning Representations by Predicting Bags of Visual Words
Spyros Gidaris
Andrei Bursuc
N. Komodakis
P. Pérez
Matthieu Cord
SSL
116
118
0
27 Feb 2020
Disentangling Adaptive Gradient Methods from Learning Rates
Disentangling Adaptive Gradient Methods from Learning Rates
Naman Agarwal
Rohan Anil
Elad Hazan
Tomer Koren
Cyril Zhang
109
38
0
26 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
138
151
0
26 Feb 2020
Moniqua: Modulo Quantized Communication in Decentralized SGD
Moniqua: Modulo Quantized Communication in Decentralized SGD
Yucheng Lu
Christopher De Sa
MQ
79
50
0
26 Feb 2020
Stagewise Enlargement of Batch Size for SGD-based Learning
Stagewise Enlargement of Batch Size for SGD-based Learning
Shen-Yi Zhao
Yin-Peng Xie
Wu-Jun Li
43
1
0
26 Feb 2020
On Feature Normalization and Data Augmentation
On Feature Normalization and Data Augmentation
Boyi Li
Felix Wu
Ser-Nam Lim
Serge J. Belongie
Kilian Q. Weinberger
56
138
0
25 Feb 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function
  in Deep Networks
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
106
20
0
24 Feb 2020
The Two Regimes of Deep Network Training
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
94
45
0
24 Feb 2020
Self-Adaptive Training: beyond Empirical Risk Minimization
Self-Adaptive Training: beyond Empirical Risk Minimization
Lang Huang
Chaoning Zhang
Hongyang R. Zhang
NoLa
97
205
0
24 Feb 2020
Communication Contention Aware Scheduling of Multiple Deep Learning
  Training Jobs
Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs
Qiang-qiang Wang
Shaoshuai Shi
Canhui Wang
Xiaowen Chu
70
13
0
24 Feb 2020
Communication-Efficient Decentralized Learning with Sparsification and
  Adaptive Peer Selection
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection
Zhenheng Tang
Shaoshuai Shi
Xiaowen Chu
FedML
62
58
0
22 Feb 2020
Communication-Efficient Edge AI: Algorithms and Systems
Communication-Efficient Edge AI: Algorithms and Systems
Yuanming Shi
Kai Yang
Tao Jiang
Jun Zhang
Khaled B. Letaief
GNN
99
335
0
22 Feb 2020
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays
  in Distributed SGD
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD
Jianyu Wang
Hao Liang
Gauri Joshi
60
33
0
21 Feb 2020
Uncertainty Principle for Communication Compression in Distributed and
  Federated Learning and the Search for an Optimal Compressor
Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor
M. Safaryan
Egor Shulgin
Peter Richtárik
110
61
0
20 Feb 2020
Revisiting Training Strategies and Generalization Performance in Deep
  Metric Learning
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning
Karsten Roth
Timo Milbich
Samarth Sinha
Prateek Gupta
Bjorn Ommer
Joseph Paul Cohen
163
173
0
19 Feb 2020
Rethinking the Hyperparameters for Fine-tuning
Rethinking the Hyperparameters for Fine-tuning
Hao Li
Pratik Chaudhari
Hao Yang
Michael Lam
Avinash Ravichandran
Rahul Bhotika
Stefano Soatto
VLM
93
130
0
19 Feb 2020
STANNIS: Low-Power Acceleration of Deep Neural Network Training Using
  Computational Storage
STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage
Ali Heydarigorji
Mahdi Torabzadehkashi
Siavash Rezaei
Hossein Bobarshad
V. Alves
Pai H. Chou
BDL
49
5
0
17 Feb 2020
Previous
123...293031...404142
Next