Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.02677
Cited By
v1
v2 (latest)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
50 / 2,054 papers shown
Title
Analyzing Human-Human Interactions: A Survey
Alexandros Stergiou
R. Poppe
69
14
0
31 Jul 2018
MnasNet: Platform-Aware Neural Architecture Search for Mobile
Mingxing Tan
Bo Chen
Ruoming Pang
Vijay Vasudevan
Mark Sandler
Andrew G. Howard
Quoc V. Le
MQ
139
3,022
0
31 Jul 2018
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
Xianyan Jia
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
...
Li Yu
Tiegang Chen
Guangxiao Hu
Shaoshuai Shi
Xiaowen Chu
112
384
0
30 Jul 2018
Pythia v0.1: the Winning Entry to the VQA Challenge 2018
Yu Jiang
Vivek Natarajan
Xinlei Chen
Marcus Rohrbach
Dhruv Batra
Devi Parikh
VLM
101
203
0
26 Jul 2018
An argument in favor of strong scaling for deep neural networks with small datasets
R. L. F. Cunha
Eduardo Rodrigues
Matheus Palhares Viana
Dario Augusto Borges Oliveira
41
2
0
24 Jul 2018
Supporting Very Large Models using Automatic Dataflow Graph Partitioning
Minjie Wang
Chien-chin Huang
Jinyang Li
120
155
0
24 Jul 2018
Newton-ADMM: A Distributed GPU-Accelerated Optimizer for Multiclass Classification Problems
Chih-Hao Fang
Sudhir B. Kylasa
Fred Roosta
Michael W. Mahoney
A. Grama
ODL
58
10
0
18 Jul 2018
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning
Hao Yu
Sen Yang
Shenghuo Zhu
MoMe
FedML
98
611
0
17 Jul 2018
3D Inception-based CNN with sMRI and MD-DTI data fusion for Alzheimer's Disease diagnostics
A. Khvostikov
Karim Aderghal
A. Krylov
G. Catheline
J. Benois-Pineau
MedIm
50
28
0
17 Jul 2018
Beyond Data and Model Parallelism for Deep Neural Networks
Zhihao Jia
Matei A. Zaharia
A. Aiken
GNN
AI4CE
71
508
0
14 Jul 2018
Tune: A Research Platform for Distributed Model Selection and Training
Richard Liaw
Eric Liang
Robert Nishihara
Philipp Moritz
Joseph E. Gonzalez
Ion Stoica
227
906
0
13 Jul 2018
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
Stanislaw Jastrzebski
Zachary Kenton
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
ODL
100
118
0
13 Jul 2018
Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention
Xingyu Liao
Lingxiao He
Zhouwang Yang
Chi Zhang
3DPC
80
73
0
12 Jul 2018
Restructuring Batch Normalization to Accelerate CNN Training
Wonkyung Jung
Daejin Jung
and Byeongho Kim
Sunjung Lee
Wonjong Rhee
Jung Ho Ahn
57
64
0
04 Jul 2018
Stochastic Layer-Wise Precision in Deep Neural Networks
Griffin Lacey
Graham W. Taylor
S. Areibi
86
18
0
03 Jul 2018
Differentiable Learning-to-Normalize via Switchable Normalization
Ping Luo
Jiamin Ren
Zhanglin Peng
Ruimao Zhang
Jingyu Li
92
177
0
28 Jun 2018
A Benchmark for Interpretability Methods in Deep Neural Networks
Sara Hooker
D. Erhan
Pieter-Jan Kindermans
Been Kim
FAtt
UQCV
134
684
0
28 Jun 2018
Stochastic natural gradient descent draws posterior samples in function space
Samuel L. Smith
Daniel Duckworth
Semon Rezchikov
Quoc V. Le
Jascha Narain Sohl-Dickstein
BDL
85
6
0
25 Jun 2018
Pushing the boundaries of parallel Deep Learning -- A practical approach
Paolo Viviani
M. Drocco
Marco Aldinucci
OOD
50
0
0
25 Jun 2018
Como funciona o Deep Learning
M. Ponti
G. B. P. D. Costa
44
14
0
20 Jun 2018
Faster SGD training by minibatch persistency
M. Fischetti
Iacopo Mandatelli
Domenico Salvagnin
49
5
0
19 Jun 2018
Kernel machines that adapt to GPUs for effective large batch training
Siyuan Ma
M. Belkin
22
2
0
15 Jun 2018
Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device
Hengfui Liau
Yamini Nimmagadda
YengLiong Wong
ObjD
51
14
0
14 Jun 2018
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
Thomas George
César Laurent
Xavier Bouthillier
Nicolas Ballas
Pascal Vincent
ODL
105
156
0
11 Jun 2018
The Effect of Network Width on the Performance of Large-batch Training
Lingjiao Chen
Hongyi Wang
Jinman Zhao
Dimitris Papailiopoulos
Paraschos Koutris
87
22
0
11 Jun 2018
PipeDream: Fast and Efficient Pipeline Parallel DNN Training
A. Harlap
Deepak Narayanan
Amar Phanishayee
Vivek Seshadri
Nikhil R. Devanur
G. Ganger
Phillip B. Gibbons
AI4CE
73
256
0
08 Jun 2018
Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated Environments
Chen Chen
Qizhen Weng
Wei Wang
Baochun Li
Bo Li
52
27
0
07 Jun 2018
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Rex Ying
Ruining He
Kaifeng Chen
Pong Eksombatchai
William L. Hamilton
J. Leskovec
GNN
BDL
372
3,572
0
06 Jun 2018
Perturbative Neural Networks
Felix Juefei Xu
Vishnu Boddeti
Marios Savvides
68
38
0
05 Jun 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
Rachel A. Ward
Xiaoxia Wu
Léon Bottou
ODL
113
369
0
05 Jun 2018
Videos as Space-Time Region Graphs
Xinyu Wang
Abhinav Gupta
128
756
0
05 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel Nacson
Nathan Srebro
Daniel Soudry
FedML
MLT
102
102
0
05 Jun 2018
Layer rotation: a surprisingly powerful indicator of generalization in deep networks?
Simon Carbonnelle
Christophe De Vleeschouwer
MLT
70
1
0
05 Jun 2018
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
Cody Coleman
Daniel Kang
Deepak Narayanan
Luigi Nardi
Tian Zhao
Jian Zhang
Peter Bailis
K. Olukotun
Christopher Ré
Matei A. Zaharia
60
117
0
04 Jun 2018
End to End Brain Fiber Orientation Estimation using Deep Learning
Nandakishore Puttashamachar
Ulas Bagci
MedIm
15
0
0
04 Jun 2018
Scaling Neural Machine Translation
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
AIMat
206
617
0
01 Jun 2018
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
196
619
0
01 Jun 2018
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
213
1,072
0
24 May 2018
Online Regularized Nonlinear Acceleration
Damien Scieur
Edouard Oyallon
Alexandre d’Aspremont
Francis R. Bach
41
13
0
24 May 2018
Do Better ImageNet Models Transfer Better?
Simon Kornblith
Jonathon Shlens
Quoc V. Le
OOD
MLT
176
1,333
0
23 May 2018
Approximate Random Dropout
Zhuoran Song
Ru Wang
Dongyu Ru
Hongru Huang
Zhenghao Peng
Hai Zhao
Xiaoyao Liang
Li Jiang
BDL
44
9
0
23 May 2018
Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit
Seyed Hamed Fatemi Langroudi
Tej Pandit
Dhireesha Kudithipudi
MQ
57
41
0
22 May 2018
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Joeri Hermans
Gilles Louppe
53
5
0
22 May 2018
Stochastic modified equations for the asynchronous stochastic gradient descent
Jing An
Jian-wei Lu
Lexing Ying
77
79
0
21 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
79
51
0
21 May 2018
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
Liang Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
154
121
0
21 May 2018
Dynamic learning rate using Mutual Information
Shrihari Vasudevan
30
6
0
18 May 2018
A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
Shaoshuai Shi
Qiang-qiang Wang
Xiaowen Chu
Yue Liu
FedML
GNN
45
23
0
10 May 2018
PSGAN: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening
Xiangyu Liu
Huanyu Zhou
Qizhi Xu
Xiangyu Liu
Yunhong Wang
GAN
58
223
0
09 May 2018
Exploring the Limits of Weakly Supervised Pretraining
D. Mahajan
Ross B. Girshick
Vignesh Ramanathan
Kaiming He
Manohar Paluri
Yixuan Li
Ashwin R. Bharambe
Laurens van der Maaten
VLM
207
1,371
0
02 May 2018
Previous
1
2
3
...
38
39
40
41
42
Next