ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
R4D: Utilizing Reference Objects for Long-Range Distance Estimation
R4D: Utilizing Reference Objects for Long-Range Distance Estimation
Yingwei Li
Tiffany Chen
Maya Kabkab
Ruichi Yu
Longlong Jing
Yurong You
Hang Zhao
34
4
0
10 Jun 2022
One Hyper-Initializer for All Network Architectures in Medical Image
  Analysis
One Hyper-Initializer for All Network Architectures in Medical Image Analysis
Fangxin Shang
Yehui Yang
Dalu Yang
Junde Wu
Xiaorong Wang
Yanwu Xu
AI4CE
73
2
0
08 Jun 2022
Spatial Cross-Attention Improves Self-Supervised Visual Representation
  Learning
Spatial Cross-Attention Improves Self-Supervised Visual Representation Learning
M. Seyfi
Amin Banitalebi-Dehkordi
Yong Zhang
SSL
59
0
0
07 Jun 2022
Extending Momentum Contrast with Cross Similarity Consistency
  Regularization
Extending Momentum Contrast with Cross Similarity Consistency Regularization
M. Seyfi
Amin Banitalebi-Dehkordi
Yong Zhang
SSL
68
12
0
07 Jun 2022
Siamese Encoder-based Spatial-Temporal Mixer for Growth Trend Prediction
  of Lung Nodules on CT Scans
Siamese Encoder-based Spatial-Temporal Mixer for Growth Trend Prediction of Lung Nodules on CT Scans
Jiansheng Fang
Jingwen Wang
Anwei Li
Yuguang Yan
Yonghe Hou
Chao Song
Hongbo Liu
Jiang Liu
31
7
0
07 Jun 2022
Self-supervised Learning for Human Activity Recognition Using 700,000
  Person-days of Wearable Data
Self-supervised Learning for Human Activity Recognition Using 700,000 Person-days of Wearable Data
H. Yuan
Shing Chan
Andrew P. Creagh
C. Tong
Aidan Acquah
David Clifton
Aiden Doherty
SSL
107
94
0
06 Jun 2022
Generalized Federated Learning via Sharpness Aware Minimization
Generalized Federated Learning via Sharpness Aware Minimization
Zhe Qu
Xingyu Li
Rui Duan
Yaojiang Liu
Bo Tang
Zhuo Lu
FedML
113
142
0
06 Jun 2022
MSR: Making Self-supervised learning Robust to Aggressive Augmentations
MSR: Making Self-supervised learning Robust to Aggressive Augmentations
Ying-Long Bai
Erkun Yang
Zhaoqing Wang
Yuxuan Du
Bo Han
Cheng Deng
Dadong Wang
Tongliang Liu
SSL
85
3
0
04 Jun 2022
Siamese Image Modeling for Self-Supervised Vision Representation
  Learning
Siamese Image Modeling for Self-Supervised Vision Representation Learning
Chenxin Tao
Xizhou Zhu
Weijie Su
Gao Huang
Bin Li
Jie Zhou
Yu Qiao
Xiaogang Wang
Jifeng Dai
SSL
111
97
0
02 Jun 2022
Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence
  in High Dimensions
Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions
Kiwon Lee
Andrew N. Cheng
Courtney Paquette
Elliot Paquette
87
14
0
02 Jun 2022
Improving the Robustness and Generalization of Deep Neural Network with
  Confidence Threshold Reduction
Improving the Robustness and Generalization of Deep Neural Network with Confidence Threshold Reduction
Xiangyuan Yang
Jie Lin
Hanlin Zhang
Xinyu Yang
Peng Zhao
AAMLOOD
67
1
0
02 Jun 2022
Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker
  Assumptions and Communication Compression as a Cherry on the Top
Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
Eduard A. Gorbunov
Samuel Horváth
Peter Richtárik
Gauthier Gidel
AAML
53
0
0
01 Jun 2022
Optimization with Access to Auxiliary Information
Optimization with Access to Auxiliary Information
El Mahdi Chayti
Sai Praneeth Karimireddy
AAML
107
10
0
01 Jun 2022
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining
Pengyuan Lyu
Chengquan Zhang
Shanshan Liu
Meina Qiao
Yangliu Xu
Liang Wu
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
122
43
0
01 Jun 2022
Rethinking the Augmentation Module in Contrastive Learning: Learning
  Hierarchical Augmentation Invariance with Expanded Views
Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views
Junbo Zhang
Kaisheng Ma
98
47
0
01 Jun 2022
Byzantine-Robust Online and Offline Distributed Reinforcement Learning
Byzantine-Robust Online and Offline Distributed Reinforcement Learning
Yiding Chen
Xuezhou Zhang
Kai Zhang
Mengdi Wang
Xiaojin Zhu
OffRL
133
18
0
01 Jun 2022
Glo-In-One: Holistic Glomerular Detection, Segmentation, and Lesion
  Characterization with Large-scale Web Image Mining
Glo-In-One: Holistic Glomerular Detection, Segmentation, and Lesion Characterization with Large-scale Web Image Mining
Tianyuan Yao
Yuzhe Lu
Jun Long
Aadarsh Jha
Zheyu Zhu
Zuhayr Asad
Haichun Yang
Agnes B. Fogo
Yuankai Huo
86
10
0
31 May 2022
Self-Supervised Visual Representation Learning with Semantic Grouping
Self-Supervised Visual Representation Learning with Semantic Grouping
Xin Wen
Bingchen Zhao
Anlin Zheng
Xinming Zhang
Xiaojuan Qi
SSL
226
74
0
30 May 2022
Efficient-Adam: Communication-Efficient Distributed Adam
Efficient-Adam: Communication-Efficient Distributed Adam
Congliang Chen
Li Shen
Wei Liu
Zhi-Quan Luo
58
20
0
28 May 2022
A Closer Look at Self-Supervised Lightweight Vision Transformers
A Closer Look at Self-Supervised Lightweight Vision Transformers
Shaoru Wang
Jin Gao
Zeming Li
Jian Sun
Weiming Hu
ViT
152
46
0
28 May 2022
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
Siyuan Li
Di Wu
Fang Wu
Lei Shang
Stan.Z.Li
84
50
0
27 May 2022
DLTTA: Dynamic Learning Rate for Test-time Adaptation on Cross-domain
  Medical Images
DLTTA: Dynamic Learning Rate for Test-time Adaptation on Cross-domain Medical Images
Hongzheng Yang
Cheng Chen
Meirui Jiang
Quande Liu
Jianfeng Cao
Pheng Ann Heng
Qi Dou
OOD
84
30
0
27 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual
  Recognition
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
249
703
0
26 May 2022
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of
  Hierarchical Vision Transformers
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
Jihao Liu
Xin Huang
Jinliang Zheng
Yu Liu
Hongsheng Li
67
55
0
26 May 2022
Trainable Weight Averaging: Accelerating Training and Improving Generalization
Trainable Weight Averaging: Accelerating Training and Improving Generalization
Tao Li
Zhehao Huang
Yingwen Wu
Zhengbao He
Qinghua Tao
Xiaolin Huang
Chih-Jen Lin
MoMe
119
3
0
26 May 2022
Autoformalization with Large Language Models
Autoformalization with Large Language Models
Yuhuai Wu
Albert Q. Jiang
Wenda Li
M. Rabe
Charles Staats
M. Jamnik
Christian Szegedy
AI4CE
293
178
0
25 May 2022
Human Instance Matting via Mutual Guidance and Multi-Instance Refinement
Human Instance Matting via Mutual Guidance and Multi-Instance Refinement
Yanan Sun
Chi-Keung Tang
Yu-Wing Tai
3DH
108
26
0
22 May 2022
MultiBiSage: A Web-Scale Recommendation System Using Multiple Bipartite
  Graphs at Pinterest
MultiBiSage: A Web-Scale Recommendation System Using Multiple Bipartite Graphs at Pinterest
Saket Gurukar
Nikil Pancha
Andrew Zhai
Eric Kim
Samson Hu
Srinivas Parthasarathy
Charles R. Rosenberg
J. Leskovec
120
16
0
21 May 2022
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate
N. H. Phong
A. Santos
B. Ribeiro
82
8
0
20 May 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
158
47
0
20 May 2022
Scalable algorithms for physics-informed neural and graph networks
Scalable algorithms for physics-informed neural and graph networks
K. Shukla
Mengjia Xu
N. Trask
George Karniadakis
PINNAI4CE
131
41
0
16 May 2022
Guidelines for the Regularization of Gammas in Batch Normalization for
  Deep Residual Networks
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Dong Gu Lee
Wonseok Jeong
Sang Woo Kim
50
4
0
15 May 2022
ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent
  Training
ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training
Yue Zhao
Yantao Shen
Yuanjun Xiong
Shuo Yang
Wei Xia
Zhuowen Tu
Bernt Shiele
Stefano Soatto
BDL
103
6
0
12 May 2022
On Distributed Adaptive Optimization with Gradient Compression
On Distributed Adaptive Optimization with Gradient Compression
Xiaoyun Li
Belhal Karimi
Ping Li
77
27
0
11 May 2022
Accelerating the Training of Video Super-Resolution Models
Accelerating the Training of Video Super-Resolution Models
Lijian Lin
Xintao Wang
Zhongang Qi
Ying Shan
73
3
0
10 May 2022
A Communication-Efficient Distributed Gradient Clipping Algorithm for
  Training Deep Neural Networks
A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
Mingrui Liu
Zhenxun Zhuang
Yunwei Lei
Chunyang Liao
79
20
0
10 May 2022
Large Scale Transfer Learning for Differentially Private Image
  Classification
Large Scale Transfer Learning for Differentially Private Image Classification
Harsh Mehta
Abhradeep Thakurta
Alexey Kurakin
Ashok Cutkosky
87
41
0
06 May 2022
FastRE: Towards Fast Relation Extraction with Convolutional Encoder and
  Improved Cascade Binary Tagging Framework
FastRE: Towards Fast Relation Extraction with Convolutional Encoder and Improved Cascade Binary Tagging Framework
Guozheng Li
Xu Chen
Peng Wang
Jiafeng Xie
Qiqing Luo
ViT
82
12
0
05 May 2022
Few-Shot Document-Level Relation Extraction
Few-Shot Document-Level Relation Extraction
Nicholas Popovic
Michael Färber
82
14
0
04 May 2022
Gradient Descent, Stochastic Optimization, and Other Tales
Gradient Descent, Stochastic Optimization, and Other Tales
Jun Lu
58
8
0
02 May 2022
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
Shuai Zhao
Linchao Zhu
Xiaohan Wang
Yi Yang
VLMCLIP
76
122
0
02 May 2022
None Class Ranking Loss for Document-Level Relation Extraction
None Class Ranking Loss for Document-Level Relation Extraction
Yang Zhou
W. Lee
66
17
0
01 May 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
67
39
0
30 Apr 2022
AGIC: Approximate Gradient Inversion Attack on Federated Learning
AGIC: Approximate Gradient Inversion Attack on Federated Learning
Jin Xu
Chi Hong
Jiyue Huang
L. Chen
Jérémie Decouchant
AAMLFedML
86
25
0
28 Apr 2022
Unlocking High-Accuracy Differentially Private Image Classification
  through Scale
Unlocking High-Accuracy Differentially Private Image Classification through Scale
Soham De
Leonard Berrada
Jamie Hayes
Samuel L. Smith
Borja Balle
97
233
0
28 Apr 2022
ELM: Embedding and Logit Margins for Long-Tail Learning
ELM: Embedding and Logit Margins for Long-Tail Learning
Wittawat Jitkrittum
A. Menon
A. S. Rawat
Surinder Kumar
83
11
0
27 Apr 2022
3D Magic Mirror: Clothing Reconstruction from a Single Image via a
  Causal Perspective
3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective
Zhedong Zheng
Jiayin Zhu
Wei Ji
Yi Yang
Tat-Seng Chua
3DH
73
7
0
27 Apr 2022
Bamboo: Making Preemptible Instances Resilient for Affordable Training
  of Large DNNs
Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs
John Thorpe
Pengzhan Zhao
Jon Eyolfson
Yifan Qiao
Zhihao Jia
Minjia Zhang
Ravi Netravali
Guoqing Harry Xu
83
58
0
26 Apr 2022
WebFace260M: A Benchmark for Million-Scale Deep Face Recognition
WebFace260M: A Benchmark for Million-Scale Deep Face Recognition
Zheng Hua Zhu
Guan Huang
Jiankang Deng
Yun Ye
Junjie Huang
...
Jiagang Zhu
Tian Yang
Dalong Du
Jiwen Lu
Jie Zhou
CVBM
93
44
0
21 Apr 2022
A Masked Image Reconstruction Network for Document-level Relation
  Extraction
A Masked Image Reconstruction Network for Document-level Relation Extraction
Li Zhang
Yidong Cheng
63
2
0
21 Apr 2022
Previous
123...141516...404142
Next