ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural
  Architecture Search
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search
Yi Ding
Xinyu Gong
Junru Wu
Humphrey Shi
Zhicheng Yan
Zhangyang Wang
VGen
88
1
0
09 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
94
6
0
08 Dec 2021
DiPS: Differentiable Policy for Sketching in Recommender Systems
DiPS: Differentiable Policy for Sketching in Recommender Systems
Aritra Ghosh
Saayan Mitra
Andrew Lan
BDLOffRL
57
2
0
08 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
163
699
0
02 Dec 2021
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized
  Stochastic Gradient Descent
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Wei Zhang
Mingrui Liu
Yu Feng
Xiaodong Cui
Brian Kingsbury
Yuhai Tu
53
3
0
02 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
Xiaowu Dai
Yuhua Zhu
49
4
0
02 Dec 2021
The Majority Can Help The Minority: Context-rich Minority Oversampling
  for Long-tailed Classification
The Majority Can Help The Minority: Context-rich Minority Oversampling for Long-tailed Classification
Seulki Park
Youngkyu Hong
Byeongho Heo
Sangdoo Yun
J. Choi
131
157
0
01 Dec 2021
DAFormer: Improving Network Architectures and Training Strategies for
  Domain-Adaptive Semantic Segmentation
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Lukas Hoyer
Dengxin Dai
Luc Van Gool
AI4CE
107
462
0
29 Nov 2021
Impact of classification difficulty on the weight matrices spectra in
  Deep Learning and application to early-stopping
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping
Xuran Meng
Jianfeng Yao
98
7
0
26 Nov 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
89
53
0
25 Nov 2021
Self-Distilled Self-Supervised Representation Learning
Self-Distilled Self-Supervised Representation Learning
Jiho Jang
Seonhoon Kim
Kiyoon Yoo
Chaerin Kong
Jang-Hyun Kim
Nojun Kwak
SSL
98
15
0
25 Nov 2021
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal
  Representation Learning
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
100
30
0
24 Nov 2021
ViCE: Improving Dense Representation Learning by Superpixelization and
  Contrasting Cluster Assignment
ViCE: Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment
Robin Karlsson
Tomoki Hayashi
Keisuke Fujii
Alexander Carballo
Kento Ohtani
K. Takeda
SSL
71
4
0
24 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
68
67
0
23 Nov 2021
Benchmarking Detection Transfer Learning with Vision Transformers
Benchmarking Detection Transfer Learning with Vision Transformers
Yanghao Li
Saining Xie
Xinlei Chen
Piotr Dollar
Kaiming He
Ross B. Girshick
118
170
0
22 Nov 2021
Combined Scaling for Zero-shot Transfer Learning
Combined Scaling for Zero-shot Transfer Learning
Hieu H. Pham
Zihang Dai
Golnaz Ghiasi
Kenji Kawaguchi
Hanxiao Liu
...
Yi-Ting Chen
Minh-Thang Luong
Yonghui Wu
Mingxing Tan
Quoc V. Le
VLM
120
202
0
19 Nov 2021
Rethinking Dilated Convolution for Real-time Semantic Segmentation
Rethinking Dilated Convolution for Real-time Semantic Segmentation
Roland Gao
SSeg
78
45
0
18 Nov 2021
Evaluating Transformers for Lightweight Action Recognition
Evaluating Transformers for Lightweight Action Recognition
Raivo Koot
Markus Hennerbichler
Haiping Lu
ViT
82
8
0
18 Nov 2021
Recurrent Variational Network: A Deep Learning Inverse Problem Solver
  applied to the task of Accelerated MRI Reconstruction
Recurrent Variational Network: A Deep Learning Inverse Problem Solver applied to the task of Accelerated MRI Reconstruction
George Yiasemis
Jan-Jakob Sonke
C. Sánchez
Jonas Teuwen
148
61
0
18 Nov 2021
COMET: A Novel Memory-Efficient Deep Learning Training Framework by
  Using Error-Bounded Lossy Compression
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Sian Jin
Chengming Zhang
Xintong Jiang
Yunhe Feng
Hui Guan
Guanpeng Li
Shuaiwen Leon Song
Dingwen Tao
46
25
0
18 Nov 2021
Deep neural networks-based denoising models for CT imaging and their
  efficacy
Deep neural networks-based denoising models for CT imaging and their efficacy
Prabhat Kc
R. Zeng
M. M. Farhangi
Kyle J. Myers
29
20
0
18 Nov 2021
INTERN: A New Learning Paradigm Towards General Vision
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao
Siyu Chen
Yangguang Li
Kun Wang
Zhen-fei Yin
...
F. Yu
Junjie Yan
Dahua Lin
Xiaogang Wang
Yu Qiao
110
34
0
16 Nov 2021
CGX: Adaptive System Support for Communication-Efficient Deep Learning
CGX: Adaptive System Support for Communication-Efficient Deep Learning
I. Markov
Hamidreza Ramezanikebrya
Dan Alistarh
GNN
82
5
0
16 Nov 2021
Rethinking Keypoint Representations: Modeling Keypoints and Poses as
  Objects for Multi-Person Human Pose Estimation
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation
William J. McNally
Kanav Vats
Alexander Wong
J. McPhee
103
68
0
16 Nov 2021
Task allocation for decentralized training in heterogeneous environment
Task allocation for decentralized training in heterogeneous environment
Yongyue Chao
Ming-Ray Liao
Jiaxin Gao
31
0
0
16 Nov 2021
Searching for TrioNet: Combining Convolution with Local and Global
  Self-Attention
Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Huaijin Pi
Huiyu Wang
Yingwei Li
Zizhang Li
Alan Yuille
ViT
81
3
0
15 Nov 2021
Domain Generalization on Efficient Acoustic Scene Classification using
  Residual Normalization
Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization
Byeonggeun Kim
Seunghan Yang
Jang-Hyun Kim
Simyung Chang
48
15
0
12 Nov 2021
Catalytic Role Of Noise And Necessity Of Inductive Biases In The
  Emergence Of Compositional Communication
Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication
Lukasz Kuciñski
Tomasz Korbak
P. Kołodziej
Piotr Milo's
111
20
0
11 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
747
7,885
0
11 Nov 2021
Scaling ASR Improves Zero and Few Shot Learning
Scaling ASR Improves Zero and Few Shot Learning
Alex Xiao
Weiyi Zheng
Gil Keren
Duc Le
Frank Zhang
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
Abdel-rahman Mohamed
80
23
0
10 Nov 2021
OSSEM: one-shot speaker adaptive speech enhancement using meta learning
OSSEM: one-shot speaker adaptive speech enhancement using meta learning
Cheng Yu
Szu-Wei Fu
Tsun-An Hsieh
Yu Tsao
Mirco Ravanelli
VLM
84
4
0
10 Nov 2021
Are Transformers More Robust Than CNNs?
Are Transformers More Robust Than CNNs?
Yutong Bai
Jieru Mei
Alan Yuille
Cihang Xie
ViTAAML
262
270
0
10 Nov 2021
Data Augmentation Can Improve Robustness
Data Augmentation Can Improve Robustness
Sylvestre-Alvise Rebuffi
Sven Gowal
D. A. Calian
Florian Stimberg
Olivia Wiles
Timothy A. Mann
AAML
65
293
0
09 Nov 2021
A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks
A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks
Daniel Nichols
Siddharth Singh
Shuqing Lin
A. Bhatele
OOD
57
9
0
09 Nov 2021
BlueFog: Make Decentralized Algorithms Practical for Optimization and
  Deep Learning
BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning
Bicheng Ying
Kun Yuan
Hanbin Hu
Yiming Chen
W. Yin
FedML
83
28
0
08 Nov 2021
Finite-Time Consensus Learning for Decentralized Optimization with
  Nonlinear Gossiping
Finite-Time Consensus Learning for Decentralized Optimization with Nonlinear Gossiping
Junya Chen
Sijia Wang
Lawrence Carin
Chenyang Tao
39
3
0
04 Nov 2021
MixSiam: A Mixture-based Approach to Self-supervised Representation
  Learning
MixSiam: A Mixture-based Approach to Self-supervised Representation Learning
Xiaoyang Guo
Tianhao Zhao
Yutian Lin
Bo Du
SSL
62
6
0
04 Nov 2021
PatchGame: Learning to Signal Mid-level Patches in Referential Games
PatchGame: Learning to Signal Mid-level Patches in Referential Games
Kamal Gupta
Gowthami Somepalli
Anubhav Gupta
Vinoj Jayasundara
Matthias Zwicker
Abhinav Shrivastava
79
4
0
02 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
90
15
0
01 Nov 2021
To Talk or to Work: Delay Efficient Federated Learning over Mobile Edge
  Devices
To Talk or to Work: Delay Efficient Federated Learning over Mobile Edge Devices
Pavana Prakash
Jiahao Ding
Maoqiang Wu
Minglei Shu
Rong Yu
Miao Pan
FedML
66
3
0
01 Nov 2021
Learning Debiased and Disentangled Representations for Semantic
  Segmentation
Learning Debiased and Disentangled Representations for Semantic Segmentation
Sanghyeok Chu
Dongwan Kim
Bohyung Han
70
22
0
31 Oct 2021
Sustainable AI: Environmental Implications, Challenges and Opportunities
Sustainable AI: Environmental Implications, Challenges and Opportunities
Carole-Jean Wu
Ramya Raghavendra
Udit Gupta
Bilge Acun
Newsha Ardalani
...
Maximilian Balandat
Joe Spisak
R. Jain
Michael G. Rabbat
K. Hazelwood
159
418
0
30 Oct 2021
Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
Dinghao Fan
Hengjie Lu
Shugong Xu
Shan Cao
67
16
0
29 Oct 2021
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
Jinhui Yuan
Xinqi Li
Cheng Cheng
Juncheng Liu
Ran Guo
...
Fei Yang
Xiaodong Yi
Chuan Wu
Haoran Zhang
Jie Zhao
62
41
0
28 Oct 2021
GenURL: A General Framework for Unsupervised Representation Learning
GenURL: A General Framework for Unsupervised Representation Learning
Siyuan Li
Zicheng Liu
Z. Zang
Di Wu
Zhiyuan Chen
Stan Z. Li
OOD3DGSOffRL
136
9
0
27 Oct 2021
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic
  Objectives with Skewed Hessian Spectrums
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums
Boyao Wang
Haishan Ye
Tong Zhang
116
15
0
27 Oct 2021
Exponential Graph is Provably Efficient for Decentralized Deep Training
Exponential Graph is Provably Efficient for Decentralized Deep Training
Bicheng Ying
Kun Yuan
Yiming Chen
Hanbin Hu
Pan Pan
W. Yin
FedML
115
89
0
26 Oct 2021
Parameter Prediction for Unseen Deep Architectures
Parameter Prediction for Unseen Deep Architectures
Boris Knyazev
M. Drozdzal
Graham W. Taylor
Adriana Romero Soriano
OOD
119
83
0
25 Oct 2021
Exploiting Redundancy: Separable Group Convolutional Networks on Lie
  Groups
Exploiting Redundancy: Separable Group Convolutional Networks on Lie Groups
David M. Knigge
David W. Romero
Erik J. Bekkers
96
30
0
25 Oct 2021
ZerO Initialization: Initializing Neural Networks with only Zeros and
  Ones
ZerO Initialization: Initializing Neural Networks with only Zeros and Ones
Jiawei Zhao
Florian Schäfer
Anima Anandkumar
105
26
0
25 Oct 2021
Previous
123...171819...404142
Next