ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
Receptive Field Size Optimization with Continuous Time Pooling
Receptive Field Size Optimization with Continuous Time Pooling
Dóra Babicz
Soma Kontár
M. Peto
András Fülöp
Gergely Szabó
A. Horváth
26
1
0
02 Nov 2020
A Survey on Contrastive Self-supervised Learning
A Survey on Contrastive Self-supervised Learning
Ashish Jaiswal
Ashwin Ramesh Babu
Mohammad Zaki Zadeh
Debapriya Banerjee
F. Makedon
SSL
154
1,415
0
31 Oct 2020
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1
  Accuracy in One Hour
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Arissa Wongpanich
Hieu H. Pham
J. Demmel
Mingxing Tan
Quoc V. Le
Yang You
Sameer Kumar
78
8
0
30 Oct 2020
Why Do Better Loss Functions Lead to Less Transferable Features?
Why Do Better Loss Functions Lead to Less Transferable Features?
Simon Kornblith
Ting-Li Chen
Honglak Lee
Mohammad Norouzi
FaML
126
92
0
30 Oct 2020
AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with
  Autotuned Data-Parallel Training for Tabular Data
AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data
Romain Egele
Prasanna Balaprakash
V. Vishwanath
Isabelle M Guyon
Zhengying Liu
LMTD
58
21
0
30 Oct 2020
Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural
  Architecture Search
Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
Houwen Peng
Hao Du
Hongyuan Yu
Qi Li
Jing Liao
Jianlong Fu
88
67
0
29 Oct 2020
Accordion: Adaptive Gradient Communication via Critical Learning Regime
  Identification
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
85
25
0
29 Oct 2020
Permute, Quantize, and Fine-tune: Efficient Compression of Neural
  Networks
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks
Julieta Martinez
Jashan Shewakramani
Ting Liu
Ioan Andrei Bârsan
Wenyuan Zeng
R. Urtasun
MQ
96
31
0
29 Oct 2020
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets
Kai Han
Yunhe Wang
Qiulin Zhang
Wei Zhang
Chunjing Xu
Tong Zhang
74
89
0
28 Oct 2020
Bayesian Deep Learning via Subnetwork Inference
Bayesian Deep Learning via Subnetwork Inference
Erik A. Daxberger
Eric T. Nalisnick
J. Allingham
Javier Antorán
José Miguel Hernández-Lobato
UQCVBDL
132
86
0
28 Oct 2020
RSPNet: Relative Speed Perception for Unsupervised Video Representation
  Learning
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
Peihao Chen
Deng Huang
Dongliang He
Xiang Long
Runhao Zeng
Shilei Wen
Mingkui Tan
Chuang Gan
SSL
73
134
0
27 Oct 2020
Linearly Converging Error Compensated SGD
Linearly Converging Error Compensated SGD
Eduard A. Gorbunov
D. Kovalev
Dmitry Makarenko
Peter Richtárik
225
78
0
23 Oct 2020
Deep Neural Mobile Networking
Deep Neural Mobile Networking
Chaoyun Zhang
78
1
0
23 Oct 2020
Neural Audio Fingerprint for High-specific Audio Retrieval based on
  Contrastive Learning
Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning
Sungkyun Chang
Donmoon Lee
Jeongsoon Park
Hyungui Lim
Kyogu Lee
Karam Ko
Yoonchang Han
103
35
0
22 Oct 2020
Hierarchical Federated Learning through LAN-WAN Orchestration
Hierarchical Federated Learning through LAN-WAN Orchestration
Jinliang Yuan
Mengwei Xu
Xiao Ma
Ao Zhou
Xuanzhe Liu
Shangguang Wang
FedML
60
38
0
22 Oct 2020
Document-Level Relation Extraction with Adaptive Thresholding and
  Localized Context Pooling
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling
Wenxuan Zhou
Kevin Huang
Tengyu Ma
Jing Huang
137
283
0
21 Oct 2020
How Data Augmentation affects Optimization for Linear Regression
How Data Augmentation affects Optimization for Linear Regression
Boris Hanin
Yi Sun
86
16
0
21 Oct 2020
Decentralized Deep Learning using Momentum-Accelerated Consensus
Decentralized Deep Learning using Momentum-Accelerated Consensus
Aditya Balu
Zhanhong Jiang
Sin Yong Tan
Chinmay Hedge
Young M. Lee
Soumik Sarkar
FedML
97
22
0
21 Oct 2020
Towards Scalable Distributed Training of Deep Learning on Public Cloud
  Clusters
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters
Shaoshuai Shi
Xianhao Zhou
Shutao Song
Xingyao Wang
Zilin Zhu
...
Chenyang Guo
Bo Yang
Zhibo Chen
Yongjian Wu
Xiaowen Chu
GNN
81
56
0
20 Oct 2020
AutoBSS: An Efficient Algorithm for Block Stacking Style Search
AutoBSS: An Efficient Algorithm for Block Stacking Style Search
Yikang Zhang
Jian Zhang
Zhaobai Zhong
71
4
0
20 Oct 2020
BYOL works even without batch statistics
BYOL works even without batch statistics
Pierre Harvey Richemond
Jean-Bastien Grill
Florent Altché
Corentin Tallec
Florian Strub
...
Samuel L. Smith
Soham De
Razvan Pascanu
Bilal Piot
Michal Valko
SSL
316
115
0
20 Oct 2020
CLAR: Contrastive Learning of Auditory Representations
CLAR: Contrastive Learning of Auditory Representations
Haider Al-Tahan
Y. Mohsenzadeh
SSL
195
56
0
19 Oct 2020
Modality-Pairing Learning for Brain Tumor Segmentation
Modality-Pairing Learning for Brain Tumor Segmentation
Yixin Wang
Yao Zhang
Feng Hou
Yang Liu
Jiang Tian
Cheng Zhong
Yang Zhang
Zhiqiang He
153
69
0
19 Oct 2020
Training Recommender Systems at Scale: Communication-Efficient Model and
  Data Parallelism
Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism
Vipul Gupta
Dhruv Choudhary
P. T. P. Tang
Xiaohan Wei
Xing Wang
Yuzhen Huang
A. Kejariwal
Kannan Ramchandran
Michael W. Mahoney
100
33
0
18 Oct 2020
i-Mix: A Domain-Agnostic Strategy for Contrastive Representation
  Learning
i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
Kibok Lee
Yian Zhu
Kihyuk Sohn
Chun-Liang Li
Jinwoo Shin
Honglak Lee
SSL
86
26
0
17 Oct 2020
DIFER: Differentiable Automated Feature Engineering
DIFER: Differentiable Automated Feature Engineering
Guanghui Zhu
Zhuoer Xu
Xu Guo
Chun Yuan
Yihua Huang
66
17
0
17 Oct 2020
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context
  Modeling
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
Jiahui Yu
Wei Han
Anmol Gulati
Chung-Cheng Chiu
Yue Liu
Tara N. Sainath
Yonghui Wu
Ruoming Pang
125
19
0
12 Oct 2020
Top-DB-Net: Top DropBlock for Activation Enhancement in Person
  Re-Identification
Top-DB-Net: Top DropBlock for Activation Enhancement in Person Re-Identification
Rodolfo Quispe
Hélio Pedrini
133
43
0
12 Oct 2020
A Predictive Autoscaler for Elastic Batch Jobs
A Predictive Autoscaler for Elastic Batch Jobs
Peng Gao
16
1
0
10 Oct 2020
Regularizing Neural Networks via Adversarial Model Perturbation
Regularizing Neural Networks via Adversarial Model Perturbation
Yaowei Zheng
Richong Zhang
Yongyi Mao
AAML
107
99
0
10 Oct 2020
Genetic-algorithm-optimized neural networks for gravitational wave
  classification
Genetic-algorithm-optimized neural networks for gravitational wave classification
Dwyer Deighan
Scott E. Field
C. Capano
G. Khanna
57
22
0
09 Oct 2020
Uncovering the Limits of Adversarial Training against Norm-Bounded
  Adversarial Examples
Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
Sven Gowal
Chongli Qin
J. Uesato
Timothy A. Mann
Pushmeet Kohli
AAML
75
331
0
07 Oct 2020
High-Capacity Expert Binary Networks
High-Capacity Expert Binary Networks
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
MQ
98
59
0
07 Oct 2020
Rotate to Attend: Convolutional Triplet Attention Module
Rotate to Attend: Convolutional Triplet Attention Module
Diganta Misra
Trikay Nalamada
Ajay Uppili Arasanipalai
Qibin Hou
ViT3DPC
99
611
0
06 Oct 2020
Towards a Scalable and Distributed Infrastructure for Deep Learning
  Applications
Towards a Scalable and Distributed Infrastructure for Deep Learning Applications
Bita Hasheminezhad
S. Shirzad
Nanmiao Wu
Patrick Diehl
Hannes Schulz
Hartmut Kaiser
GNNAI4CE
85
4
0
06 Oct 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses:
  The Intrinsic Learning Rate
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
112
75
0
06 Oct 2020
A Closer Look at Codistillation for Distributed Training
A Closer Look at Codistillation for Distributed Training
Shagun Sodhani
Olivier Delalleau
Mahmoud Assran
Koustuv Sinha
Nicolas Ballas
Michael G. Rabbat
129
8
0
06 Oct 2020
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Bohang Zhang
Jikai Jin
Cong Fang
Liwei Wang
129
92
0
05 Oct 2020
EqCo: Equivalent Rules for Self-supervised Contrastive Learning
EqCo: Equivalent Rules for Self-supervised Contrastive Learning
Benjin Zhu
Junqiang Huang
Zeming Li
Xiangyu Zhang
Jian Sun
SSL
79
13
0
05 Oct 2020
Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks
Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks
Kun Yuan
Quanquan Li
Dapeng Chen
Aojun Zhou
Junjie Yan
GNN
67
1
0
02 Oct 2020
XDA: Accurate, Robust Disassembly with Transfer Learning
XDA: Accurate, Robust Disassembly with Transfer Learning
Kexin Pei
Jonas Guan
David Williams-King
Junfeng Yang
Suman Jana
82
63
0
02 Oct 2020
Bag of Tricks for Adversarial Training
Bag of Tricks for Adversarial Training
Tianyu Pang
Xiao Yang
Yinpeng Dong
Hang Su
Jun Zhu
AAML
90
270
0
01 Oct 2020
Knowledge Fusion Transformers for Video Action Recognition
Knowledge Fusion Transformers for Video Action Recognition
Ganesh Samarth
Sheetal Ojha
Nikhil Pareek
ViT
59
1
0
29 Sep 2020
Improved generalization by noise enhancement
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
42
3
0
28 Sep 2020
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
  Training
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training
Dingqing Yang
Amin Ghasemazar
X. Ren
Maximilian Golub
G. Lemieux
Mieszko Lis
71
49
0
23 Sep 2020
VirtualFlow: Decoupling Deep Learning Models from the Underlying
  Hardware
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
Andrew Or
Haoyu Zhang
M. Freedman
73
10
0
20 Sep 2020
Sparse Communication for Training Deep Networks
Sparse Communication for Training Deep Networks
Negar Foroutan
Martin Jaggi
FedML
69
17
0
19 Sep 2020
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet
  without Tricks
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks
Zhiqiang Shen
Marios Savvides
92
63
0
17 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
197
80
0
17 Sep 2020
AAG: Self-Supervised Representation Learning by Auxiliary Augmentation
  with GNT-Xent Loss
AAG: Self-Supervised Representation Learning by Auxiliary Augmentation with GNT-Xent Loss
Yanlun Tu
Jianxing Feng
Yang Yang
SSL
29
1
0
17 Sep 2020
Previous
123...252627...404142
Next