Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.02677
Cited By
v1
v2 (latest)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
50 / 2,054 papers shown
Title
Dynamic Model Pruning with Feedback
Tao R. Lin
Sebastian U. Stich
Luis Barba
Daniil Dmitriev
Martin Jaggi
167
204
0
12 Jun 2020
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)
Sharan Vaswani
I. Laradji
Frederik Kunstner
S. Meng
Mark Schmidt
Simon Lacoste-Julien
142
27
0
11 Jun 2020
Data Augmentation for Graph Neural Networks
Tong Zhao
Yozen Liu
Leonardo Neves
Oliver J. Woodford
Meng Jiang
Neil Shah
GNN
164
419
0
11 Jun 2020
Revisiting Explicit Regularization in Neural Networks for Well-Calibrated Predictive Uncertainty
Taejong Joo
U. Chung
BDL
UQCV
34
0
0
11 Jun 2020
Object Detection in the DCT Domain: is Luminance the Solution?
Benjamin Deguerre
Clément Chatelain
Gilles Gasso
ObjD
66
7
0
10 Jun 2020
Extrapolation for Large-batch Training in Deep Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
101
36
0
10 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
187
363
0
08 Jun 2020
Multi-step Estimation for Gradient-based Meta-learning
Jin-Hwa Kim
Junyoung Park
Yongseok Choi
75
1
0
08 Jun 2020
AutoHAS: Efficient Hyperparameter and Architecture Search
Xuanyi Dong
Mingxing Tan
Adams Wei Yu
Daiyi Peng
Bogdan Gabrys
Quoc V. Le
TPM
76
23
0
05 Jun 2020
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training
Hongyu Zhu
Amar Phanishayee
Gennady Pekhimenko
145
50
0
05 Jun 2020
Scaling Distributed Training with Adaptive Summation
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Olli Saarikivi
Tianju Xu
Vadim Eksarevskiy
Jaliya Ekanayake
Emad Barsoum
18
9
0
04 Jun 2020
Pruning via Iterative Ranking of Sensitivity Statistics
Stijn Verdenius
M. Stol
Patrick Forré
AAML
82
38
0
01 Jun 2020
DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging
Q. Zhou
Yawen Zhang
Pengcheng Li
Xiaoyong Liu
Jun Yang
Runsheng Wang
Ru Huang
FedML
47
2
0
31 May 2020
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism
Jay H. Park
Gyeongchan Yun
Chang Yi
N. T. Nguyen
Seungmin Lee
Jaesik Choi
S. Noh
Young-ri Choi
MoE
89
134
0
28 May 2020
Novel Human-Object Interaction Detection via Adversarial Domain Generalization
Yuhang Song
Wenbo Li
Lei Zhang
Jianwei Yang
Emre Kıcıman
Hamid Palangi
Jianfeng Gao
C.-C. Jay Kuo
Pengchuan Zhang
58
5
0
22 May 2020
Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
Tongzhou Wang
Phillip Isola
SSL
185
1,871
0
20 May 2020
Map Generation from Large Scale Incomplete and Inaccurate Data Labels
Rui Zhang
C. Albrecht
Wei Zhang
Xiaodong Cui
Ulrich Finkler
David S. Kung
Siyuan Lu
29
12
0
20 May 2020
Associating Multi-Scale Receptive Fields for Fine-grained Recognition
Zihan Ye
Fuyuan Hu
Yin Liu
Zhenping Xia
Fan Lyu
Pengqing Liu
40
16
0
19 May 2020
3D deformable registration of longitudinal abdominopelvic CT images using unsupervised deep learning
Maureen van Eijnatten
L. Rundo
K. Batenburg
F. Lucka
E. Beddowes
C. Caldas
F. Gallagher
Evis Sala
Carola-Bibiane Schönlieb
Ramona Woitek
46
14
0
15 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
64
21
0
15 May 2020
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training
Yemao Xu
Dezun Dong
Weixia Xu
Xiangke Liao
47
7
0
14 May 2020
Neural Architecture Transfer
Zhichao Lu
Gautam Sreekumar
E. Goodman
W. Banzhaf
Kalyanmoy Deb
Vishnu Boddeti
AAML
92
151
0
12 May 2020
Benchmark Tests of Convolutional Neural Network and Graph Convolutional Network on HorovodRunner Enabled Spark Clusters
Jing Pan
Wendao Liu
Jing Zhou
GNN
BDL
42
2
0
12 May 2020
AutoCLINT: The Winning Method in AutoCV Challenge 2019
Woonhyuk Baek
Ildoo Kim
Sungwoong Kim
Sungbin Lim
52
2
0
09 May 2020
Blind Backdoors in Deep Learning Models
Eugene Bagdasaryan
Vitaly Shmatikov
AAML
FedML
SILM
163
311
0
08 May 2020
Physics-informed neural network for ultrasound nondestructive quantification of surface breaking cracks
K. Shukla
P. C. D. Leoni
J. Blackshire
D. Sparkman
George Karniadakis
PINN
AI4CE
93
234
0
07 May 2020
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks
Zhishuai Guo
Mingrui Liu
Zhuoning Yuan
Li Shen
Wei Liu
Tianbao Yang
93
42
0
05 May 2020
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Wenxuan Zhou
Bill Yuchen Lin
Xiang Ren
100
25
0
02 May 2020
Dynamic backup workers for parallel machine learning
Chuan Xu
Giovanni Neglia
Nicola Sebastianelli
72
11
0
30 Apr 2020
Caramel: Accelerating Decentralized Distributed Deep Learning with Computation Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
Brighten Godfrey
R. Campbell
34
2
0
29 Apr 2020
A novel Region of Interest Extraction Layer for Instance Segmentation
L. Rossi
Akbar Karimi
Andrea Prati
76
65
0
28 Apr 2020
Exploring Self-attention for Image Recognition
Hengshuang Zhao
Jiaya Jia
V. Koltun
SSL
100
790
0
28 Apr 2020
The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent
Xin-Yao Qian
Diego Klabjan
ODL
72
36
0
27 Apr 2020
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Menglin Jia
Mengyun Shi
Mikhail Sirotenko
Huayu Chen
Claire Cardie
B. Hariharan
Hartwig Adam
Serge J. Belongie
93
97
0
26 Apr 2020
How to Train your DNN: The Network Operator Edition
M. Chang
D. Bottini
Lisa Jian
Pranay Kumar
Aurojit Panda
S. Shenker
24
1
0
21 Apr 2020
torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models
Chiheon Kim
Heungsub Lee
Myungryong Jeong
Woonhyuk Baek
Boogeon Yoon
Ildoo Kim
Sungbin Lim
Sungwoong Kim
MoE
AI4CE
51
54
0
21 Apr 2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Wenjie Li
Zhaoyang Zhang
Xinjiang Wang
Ping Luo
ODL
51
28
0
21 Apr 2020
A Generalization of the Allreduce Operation
D. Kolmakov
Xuecang Zhang
40
5
0
20 Apr 2020
ResNeSt: Split-Attention Networks
Hang Zhang
Chongruo Wu
Zhongyue Zhang
Yi Zhu
Yanghua Peng
...
Tong He
Jonas W. Mueller
R. Manmatha
Mu Li
Alex Smola
173
1,486
0
19 Apr 2020
Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms
Yujing Ma
Florin Rusu
33
3
0
19 Apr 2020
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
88
259
0
17 Apr 2020
Spatially Attentive Output Layer for Image Classification
Ildoo Kim
Woonhyuk Baek
Sungwoong Kim
35
25
0
16 Apr 2020
Asynchronous Interaction Aggregation for Action Detection
Jiajun Tang
Jinchao Xia
Xinzhi Mu
Bo Pang
Cewu Lu
82
121
0
16 Apr 2020
ESResNet: Environmental Sound Classification Based on Visual Domain Models
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
VLM
123
94
0
15 Apr 2020
Generating Fact Checking Explanations
Pepa Atanasova
J. Simonsen
Christina Lioma
Isabelle Augenstein
67
200
0
13 Apr 2020
Improved Residual Networks for Image and Video Recognition
Ionut Cosmin Duta
Li Liu
Fan Zhu
Ling Shao
SSeg
AI4TS
61
173
0
10 Apr 2020
Straggler-aware Distributed Learning: Communication Computation Latency Trade-off
Emre Ozfatura
S. Ulukus
Deniz Gunduz
56
42
0
10 Apr 2020
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
184
1,029
0
09 Apr 2020
Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
Pengzhan Guo
Zeyang Ye
Keli Xiao
Wei Zhu
50
14
0
07 Apr 2020
Temporal Pyramid Network for Action Recognition
Ceyuan Yang
Yinghao Xu
Jianping Shi
Bo Dai
Bolei Zhou
59
376
0
07 Apr 2020
Previous
1
2
3
...
28
29
30
...
40
41
42
Next