ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
High-probability Bounds for Non-Convex Stochastic Optimization with
  Heavy Tails
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails
Ashok Cutkosky
Harsh Mehta
83
62
0
28 Jun 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
131
35
0
25 Jun 2021
BFTrainer: Low-Cost Training of Neural Networks on Unfillable
  Supercomputer Nodes
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Zhengchun Liu
R. Kettimuthu
M. Papka
Ian Foster
46
3
0
22 Jun 2021
Unsupervised Object-Level Representation Learning from Scene Images
Unsupervised Object-Level Representation Learning from Scene Images
Jiahao Xie
Xiaohang Zhan
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
SSLOCL
90
77
0
22 Jun 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLMViT
119
170
0
21 Jun 2021
Secure Distributed Training at Scale
Secure Distributed Training at Scale
Eduard A. Gorbunov
Alexander Borzunov
Michael Diskin
Max Ryabinin
FedML
90
15
0
21 Jun 2021
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse
  Response Simulation for Sound Event Localization and Detection
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
E. Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
56
23
0
21 Jun 2021
CD-SGD: Distributed Stochastic Gradient Descent with Compression and
  Delay Compensation
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation
Enda Yu
Dezun Dong
Yemao Xu
Shuo Ouyang
Xiangke Liao
49
5
0
21 Jun 2021
Distributed Deep Learning in Open Collaborations
Distributed Deep Learning in Open Collaborations
Michael Diskin
Alexey Bukhtiyarov
Max Ryabinin
Lucile Saulnier
Quentin Lhoest
...
Denis Mazur
Ilia Kobelev
Yacine Jernite
Thomas Wolf
Gennady Pekhimenko
FedML
131
59
0
18 Jun 2021
Efficient Self-supervised Vision Transformers for Representation
  Learning
Efficient Self-supervised Vision Transformers for Representation Learning
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
110
214
0
17 Jun 2021
Long-Short Temporal Contrastive Learning of Video Transformers
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLMViT
153
50
0
17 Jun 2021
Robust Training in High Dimensions via Block Coordinate Geometric Median
  Descent
Robust Training in High Dimensions via Block Coordinate Geometric Median Descent
Anish Acharya
Abolfazl Hashemi
Prateek Jain
Sujay Sanghavi
Inderjit S. Dhillon
Ufuk Topcu
70
33
0
16 Jun 2021
To Raise or Not To Raise: The Autonomous Learning Rate Question
To Raise or Not To Raise: The Autonomous Learning Rate Question
Xiaomeng Dong
Tao Tan
Michael Potter
Yun-Chan Tsai
Gaurav Kumar
V. R. Saripalli
Theodore Trafalis
OOD
43
2
0
16 Jun 2021
Self-Supervised Learning with Kernel Dependence Maximization
Self-Supervised Learning with Kernel Dependence Maximization
Yazhe Li
Roman Pogodin
Danica J. Sutherland
Arthur Gretton
SSL
100
85
0
15 Jun 2021
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Mateusz Malinowski
Dimitrios Vytiniotis
G. Swirszcz
Viorica Patraucean
João Carreira
65
8
0
15 Jun 2021
On Large-Cohort Training for Federated Learning
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
79
114
0
15 Jun 2021
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep
  Learning
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Minghan Yang
Dong Xu
Qiwen Cui
Zaiwen Wen
Pengxiang Xu
48
4
0
14 Jun 2021
Quality-Aware Network for Face Parsing
Quality-Aware Network for Face Parsing
Lu Yang
Q. Song
Xueshi Xin
Wenhe Jia
Zhiwei Liu
CVBM
45
3
0
14 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFinMQAI4MH
177
864
0
14 Jun 2021
Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets
  Win
Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win
Jaron Maene
Mingxiao Li
Marie-Francine Moens
66
15
0
13 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen
Kshitiz Malik
Hongyuan Zhan
Ashkan Yousefpour
Michael G. Rabbat
Mani Malek
Dzmitry Huba
FedML
101
316
0
11 Jun 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
145
120
0
11 Jun 2021
Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous
  Distributed Learning
Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning
Eugene Belilovsky
Louis Leconte
Lucas Caccia
Michael Eickenberg
Edouard Oyallon
45
7
0
11 Jun 2021
Beyond BatchNorm: Towards a Unified Understanding of Normalization in
  Deep Learning
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
80
39
0
10 Jun 2021
MST: Masked Self-Supervised Transformer for Visual Representation
MST: Masked Self-Supervised Transformer for Visual Representation
Zhaowen Li
Zhiyang Chen
Fan Yang
Wei Li
Yousong Zhu
...
Rui Deng
Liwei Wu
Rui Zhao
Ming Tang
Jinqiao Wang
ViT
98
168
0
10 Jun 2021
Salient Object Ranking with Position-Preserved Attention
Salient Object Ranking with Position-Preserved Attention
Haoyang Fang
Daoxin Zhang
Yi Zhang
Minghao Chen
Jiawei Li
Yao Hu
Deng Cai
Xiaofei He
71
21
0
09 Jun 2021
The dilemma of quantum neural networks
The dilemma of quantum neural networks
Yan Qian
Xinbiao Wang
Yuxuan Du
Xingyao Wu
Dacheng Tao
59
31
0
09 Jun 2021
Interpretable agent communication from scratch (with a generic visual
  processor emerging on the side)
Interpretable agent communication from scratch (with a generic visual processor emerging on the side)
Roberto Dessì
Eugene Kharitonov
Marco Baroni
95
28
0
08 Jun 2021
Incorporating NODE with Pre-trained Neural Differential Operator for
  Learning Dynamics
Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics
Shiqi Gong
Qi Meng
Yue Wang
Lijun Wu
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
57
4
0
08 Jun 2021
Broadcasted Residual Learning for Efficient Keyword Spotting
Broadcasted Residual Learning for Efficient Keyword Spotting
Byeonggeun Kim
Simyung Chang
Jinkyu Lee
Dooyong Sung
124
125
0
08 Jun 2021
Asynchronous Distributed Optimization with Redundancy in Cost Functions
Asynchronous Distributed Optimization with Redundancy in Cost Functions
Shuo Liu
Nirupam Gupta
Nitin H. Vaidya
74
3
0
07 Jun 2021
Proxy-Normalizing Activations to Match Batch Normalization while
  Removing Batch Dependence
Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence
A. Labatie
Dominic Masters
Zach Eaton-Rosen
Carlo Luschi
135
21
0
07 Jun 2021
Redundant representations help generalization in wide neural networks
Redundant representations help generalization in wide neural networks
Diego Doimo
Aldo Glielmo
Sebastian Goldt
Alessandro Laio
AI4CE
79
9
0
07 Jun 2021
Rethinking Training from Scratch for Object Detection
Rethinking Training from Scratch for Object Detection
Yang Li
Hong Zhang
Yu Zhang
VLMOnRLObjD
62
5
0
06 Jun 2021
Reducing the feature divergence of RGB and near-infrared images using
  Switchable Normalization
Reducing the feature divergence of RGB and near-infrared images using Switchable Normalization
Siwei Yang
Shaozuo Yu
Bingchen Zhao
Yin Wang
108
13
0
06 Jun 2021
SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense
  Face Alignment and Reconstruction
SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction
Zeyu Ruan
C. Zou
Longhai Wu
Gangshan Wu
Limin Wang
3DVCVBM3DH
75
56
0
06 Jun 2021
Aligning Pretraining for Detection via Object-Level Contrastive Learning
Aligning Pretraining for Detection via Object-Level Contrastive Learning
Fangyun Wei
Yue Gao
Zhirong Wu
Han Hu
Stephen Lin
ObjD
72
148
0
04 Jun 2021
Self-Supervised Learning of Domain Invariant Features for Depth
  Estimation
Self-Supervised Learning of Domain Invariant Features for Depth Estimation
Hiroyasu Akada
S. Bhat
Ibraheem Alhashim
Peter Wonka
OODSSLMDE
97
15
0
04 Jun 2021
Anticipative Video Transformer
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
91
212
0
03 Jun 2021
Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation
  Operation for Semantic Segmentation
Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation Operation for Semantic Segmentation
Yechao Bai
Ziyuan Huang
Lyuyu Shen
Hongliang Guo
Marcelo H. Ang Jr
Daniela Rus
SSeg
18
4
0
03 Jun 2021
NODE-GAM: Neural Generalized Additive Model for Interpretable Deep
  Learning
NODE-GAM: Neural Generalized Additive Model for Interpretable Deep Learning
C. Chang
R. Caruana
Anna Goldenberg
AI4CE
93
80
0
03 Jun 2021
CT-Net: Channel Tensorization Network for Video Classification
CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li
Xianhang Li
Yali Wang
Jun Wang
Yu Qiao
ViT
72
55
0
03 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
87
13
0
01 Jun 2021
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and
  Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images
Mehdi Cherti
J. Jitsev
LM&MA
119
24
0
31 May 2021
LRTuner: A Learning Rate Tuner for Deep Neural Networks
LRTuner: A Learning Rate Tuner for Deep Neural Networks
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
ODL
55
1
0
30 May 2021
Tesseract: Parallelize the Tensor Parallelism Efficiently
Tesseract: Parallelize the Tensor Parallelism Efficiently
Boxiang Wang
Qifan Xu
Zhengda Bian
Yang You
VLMGNN
40
35
0
30 May 2021
Maximizing Parallelism in Distributed Training for Huge Neural Networks
Maximizing Parallelism in Distributed Training for Huge Neural Networks
Zhengda Bian
Qifan Xu
Boxiang Wang
Yang You
MoE
63
48
0
30 May 2021
Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with
  Rank Reordering
Cloud Collectives: Towards Cloud-aware Collectives forML Workloads with Rank Reordering
Liang Luo
Jacob Nelson
Arvind Krishnamurthy
Luis Ceze
235
1
0
28 May 2021
A Sum-of-Ratios Multi-Dimensional-Knapsack Decomposition for DNN
  Resource Scheduling
A Sum-of-Ratios Multi-Dimensional-Knapsack Decomposition for DNN Resource Scheduling
Menglu Yu
Chuan Wu
Bo Ji
Jia Liu
50
9
0
28 May 2021
Neonatal seizure detection from raw multi-channel EEG using a fully
  convolutional architecture
Neonatal seizure detection from raw multi-channel EEG using a fully convolutional architecture
Alison O'Shea
G. Lightbody
Geraldine Boylan
A. Temko
42
121
0
28 May 2021
Previous
123...202122...404142
Next