ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
Dynamic Model Pruning with Feedback
Dynamic Model Pruning with Feedback
Tao R. Lin
Sebastian U. Stich
Luis Barba
Daniil Dmitriev
Martin Jaggi
167
204
0
12 Jun 2020
Adaptive Gradient Methods Converge Faster with Over-Parameterization
  (but you should do a line-search)
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)
Sharan Vaswani
I. Laradji
Frederik Kunstner
S. Meng
Mark Schmidt
Simon Lacoste-Julien
142
27
0
11 Jun 2020
Data Augmentation for Graph Neural Networks
Data Augmentation for Graph Neural Networks
Tong Zhao
Yozen Liu
Leonardo Neves
Oliver J. Woodford
Meng Jiang
Neil Shah
GNN
164
419
0
11 Jun 2020
Revisiting Explicit Regularization in Neural Networks for
  Well-Calibrated Predictive Uncertainty
Revisiting Explicit Regularization in Neural Networks for Well-Calibrated Predictive Uncertainty
Taejong Joo
U. Chung
BDLUQCV
34
0
0
11 Jun 2020
Object Detection in the DCT Domain: is Luminance the Solution?
Object Detection in the DCT Domain: is Luminance the Solution?
Benjamin Deguerre
Clément Chatelain
Gilles Gasso
ObjD
66
7
0
10 Jun 2020
Extrapolation for Large-batch Training in Deep Learning
Extrapolation for Large-batch Training in Deep Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
101
36
0
10 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and
  Strong Baselines
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
187
363
0
08 Jun 2020
Multi-step Estimation for Gradient-based Meta-learning
Multi-step Estimation for Gradient-based Meta-learning
Jin-Hwa Kim
Junyoung Park
Yongseok Choi
75
1
0
08 Jun 2020
AutoHAS: Efficient Hyperparameter and Architecture Search
AutoHAS: Efficient Hyperparameter and Architecture Search
Xuanyi Dong
Mingxing Tan
Adams Wei Yu
Daiyi Peng
Bogdan Gabrys
Quoc V. Le
TPM
76
23
0
05 Jun 2020
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN
  Training
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training
Hongyu Zhu
Amar Phanishayee
Gennady Pekhimenko
145
50
0
05 Jun 2020
Scaling Distributed Training with Adaptive Summation
Scaling Distributed Training with Adaptive Summation
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Olli Saarikivi
Tianju Xu
Vadim Eksarevskiy
Jaliya Ekanayake
Emad Barsoum
18
9
0
04 Jun 2020
Pruning via Iterative Ranking of Sensitivity Statistics
Pruning via Iterative Ranking of Sensitivity Statistics
Stijn Verdenius
M. Stol
Patrick Forré
AAML
82
38
0
01 Jun 2020
DaSGD: Squeezing SGD Parallelization Performance in Distributed Training
  Using Delayed Averaging
DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging
Q. Zhou
Yawen Zhang
Pengcheng Li
Xiaoyong Liu
Jun Yang
Runsheng Wang
Ru Huang
FedML
47
2
0
31 May 2020
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU
  Clusters through Integration of Pipelined Model Parallelism and Data
  Parallelism
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism
Jay H. Park
Gyeongchan Yun
Chang Yi
N. T. Nguyen
Seungmin Lee
Jaesik Choi
S. Noh
Young-ri Choi
MoE
89
134
0
28 May 2020
Novel Human-Object Interaction Detection via Adversarial Domain
  Generalization
Novel Human-Object Interaction Detection via Adversarial Domain Generalization
Yuhang Song
Wenbo Li
Lei Zhang
Jianwei Yang
Emre Kıcıman
Hamid Palangi
Jianfeng Gao
C.-C. Jay Kuo
Pengchuan Zhang
58
5
0
22 May 2020
Understanding Contrastive Representation Learning through Alignment and
  Uniformity on the Hypersphere
Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
Tongzhou Wang
Phillip Isola
SSL
185
1,871
0
20 May 2020
Map Generation from Large Scale Incomplete and Inaccurate Data Labels
Map Generation from Large Scale Incomplete and Inaccurate Data Labels
Rui Zhang
C. Albrecht
Wei Zhang
Xiaodong Cui
Ulrich Finkler
David S. Kung
Siyuan Lu
29
12
0
20 May 2020
Associating Multi-Scale Receptive Fields for Fine-grained Recognition
Associating Multi-Scale Receptive Fields for Fine-grained Recognition
Zihan Ye
Fuyuan Hu
Yin Liu
Zhenping Xia
Fan Lyu
Pengqing Liu
40
16
0
19 May 2020
3D deformable registration of longitudinal abdominopelvic CT images
  using unsupervised deep learning
3D deformable registration of longitudinal abdominopelvic CT images using unsupervised deep learning
Maureen van Eijnatten
L. Rundo
K. Batenburg
F. Lucka
E. Beddowes
C. Caldas
F. Gallagher
Evis Sala
Carola-Bibiane Schönlieb
Ramona Woitek
46
14
0
15 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for
  Convex Problems
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
64
21
0
15 May 2020
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed
  Training
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training
Yemao Xu
Dezun Dong
Weixia Xu
Xiangke Liao
47
7
0
14 May 2020
Neural Architecture Transfer
Neural Architecture Transfer
Zhichao Lu
Gautam Sreekumar
E. Goodman
W. Banzhaf
Kalyanmoy Deb
Vishnu Boddeti
AAML
92
151
0
12 May 2020
Benchmark Tests of Convolutional Neural Network and Graph Convolutional
  Network on HorovodRunner Enabled Spark Clusters
Benchmark Tests of Convolutional Neural Network and Graph Convolutional Network on HorovodRunner Enabled Spark Clusters
Jing Pan
Wendao Liu
Jing Zhou
GNNBDL
42
2
0
12 May 2020
AutoCLINT: The Winning Method in AutoCV Challenge 2019
AutoCLINT: The Winning Method in AutoCV Challenge 2019
Woonhyuk Baek
Ildoo Kim
Sungwoong Kim
Sungbin Lim
52
2
0
09 May 2020
Blind Backdoors in Deep Learning Models
Blind Backdoors in Deep Learning Models
Eugene Bagdasaryan
Vitaly Shmatikov
AAMLFedMLSILM
163
311
0
08 May 2020
Physics-informed neural network for ultrasound nondestructive
  quantification of surface breaking cracks
Physics-informed neural network for ultrasound nondestructive quantification of surface breaking cracks
K. Shukla
P. C. D. Leoni
J. Blackshire
D. Sparkman
George Karniadakis
PINNAI4CE
93
234
0
07 May 2020
Communication-Efficient Distributed Stochastic AUC Maximization with
  Deep Neural Networks
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks
Zhishuai Guo
Mingrui Liu
Zhuoning Yuan
Li Shen
Wei Liu
Tianbao Yang
93
42
0
05 May 2020
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Wenxuan Zhou
Bill Yuchen Lin
Xiang Ren
100
25
0
02 May 2020
Dynamic backup workers for parallel machine learning
Dynamic backup workers for parallel machine learning
Chuan Xu
Giovanni Neglia
Nicola Sebastianelli
72
11
0
30 Apr 2020
Caramel: Accelerating Decentralized Distributed Deep Learning with
  Computation Scheduling
Caramel: Accelerating Decentralized Distributed Deep Learning with Computation Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
Brighten Godfrey
R. Campbell
34
2
0
29 Apr 2020
A novel Region of Interest Extraction Layer for Instance Segmentation
A novel Region of Interest Extraction Layer for Instance Segmentation
L. Rossi
Akbar Karimi
Andrea Prati
76
65
0
28 Apr 2020
Exploring Self-attention for Image Recognition
Exploring Self-attention for Image Recognition
Hengshuang Zhao
Jiaya Jia
V. Koltun
SSL
100
790
0
28 Apr 2020
The Impact of the Mini-batch Size on the Variance of Gradients in
  Stochastic Gradient Descent
The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent
Xin-Yao Qian
Diego Klabjan
ODL
72
36
0
27 Apr 2020
Fashionpedia: Ontology, Segmentation, and an Attribute Localization
  Dataset
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Menglin Jia
Mengyun Shi
Mikhail Sirotenko
Huayu Chen
Claire Cardie
B. Hariharan
Hartwig Adam
Serge J. Belongie
93
97
0
26 Apr 2020
How to Train your DNN: The Network Operator Edition
How to Train your DNN: The Network Operator Edition
M. Chang
D. Bottini
Lisa Jian
Pranay Kumar
Aurojit Panda
S. Shenker
24
1
0
21 Apr 2020
torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models
torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models
Chiheon Kim
Heungsub Lee
Myungryong Jeong
Woonhyuk Baek
Boogeon Yoon
Ildoo Kim
Sungbin Lim
Sungwoong Kim
MoEAI4CE
51
54
0
21 Apr 2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Wenjie Li
Zhaoyang Zhang
Xinjiang Wang
Ping Luo
ODL
51
28
0
21 Apr 2020
A Generalization of the Allreduce Operation
A Generalization of the Allreduce Operation
D. Kolmakov
Xuecang Zhang
40
5
0
20 Apr 2020
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
Hang Zhang
Chongruo Wu
Zhongyue Zhang
Yi Zhu
Yanghua Peng
...
Tong He
Jonas W. Mueller
R. Manmatha
Mu Li
Alex Smola
173
1,486
0
19 Apr 2020
Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms
Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms
Yujing Ma
Florin Rusu
33
3
0
19 Apr 2020
Understanding the Difficulty of Training Transformers
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
88
259
0
17 Apr 2020
Spatially Attentive Output Layer for Image Classification
Spatially Attentive Output Layer for Image Classification
Ildoo Kim
Woonhyuk Baek
Sungwoong Kim
35
25
0
16 Apr 2020
Asynchronous Interaction Aggregation for Action Detection
Asynchronous Interaction Aggregation for Action Detection
Jiajun Tang
Jinchao Xia
Xinzhi Mu
Bo Pang
Cewu Lu
82
121
0
16 Apr 2020
ESResNet: Environmental Sound Classification Based on Visual Domain
  Models
ESResNet: Environmental Sound Classification Based on Visual Domain Models
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
VLM
123
94
0
15 Apr 2020
Generating Fact Checking Explanations
Generating Fact Checking Explanations
Pepa Atanasova
J. Simonsen
Christina Lioma
Isabelle Augenstein
67
200
0
13 Apr 2020
Improved Residual Networks for Image and Video Recognition
Improved Residual Networks for Image and Video Recognition
Ionut Cosmin Duta
Li Liu
Fan Zhu
Ling Shao
SSegAI4TS
61
173
0
10 Apr 2020
Straggler-aware Distributed Learning: Communication Computation Latency
  Trade-off
Straggler-aware Distributed Learning: Communication Computation Latency Trade-off
Emre Ozfatura
S. Ulukus
Deniz Gunduz
56
42
0
10 Apr 2020
X3D: Expanding Architectures for Efficient Video Recognition
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
184
1,029
0
09 Apr 2020
Weighted Aggregating Stochastic Gradient Descent for Parallel Deep
  Learning
Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
Pengzhan Guo
Zeyang Ye
Keli Xiao
Wei Zhu
50
14
0
07 Apr 2020
Temporal Pyramid Network for Action Recognition
Temporal Pyramid Network for Action Recognition
Ceyuan Yang
Yinghao Xu
Jianping Shi
Bo Dai
Bolei Zhou
59
376
0
07 Apr 2020
Previous
123...282930...404142
Next