ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
A generic self-supervised learning (SSL) framework for representation
  learning from spectra-spatial feature of unlabeled remote sensing imagery
A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery
Xin Zhang
Liangxiu Han
SSL
92
3
0
27 Jun 2023
A Positive-Unlabeled Metric Learning Framework for Document-Level
  Relation Extraction with Incomplete Labeling
A Positive-Unlabeled Metric Learning Framework for Document-Level Relation Extraction with Incomplete Labeling
Ye Wang
Huazheng Pan
Tao Zhang
Wen Wu
Wen-zhong Hu
87
5
0
26 Jun 2023
Scaling MLPs: A Tale of Inductive Bias
Scaling MLPs: A Tale of Inductive Bias
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
110
38
0
23 Jun 2023
FFCV: Accelerating Training by Removing Data Bottlenecks
FFCV: Accelerating Training by Removing Data Bottlenecks
Guillaume Leclerc
Andrew Ilyas
Logan Engstrom
Sung Min Park
Hadi Salman
Aleksander Madry
66
70
0
21 Jun 2023
Continual Learners are Incremental Model Generalizers
Continual Learners are Incremental Model Generalizers
Jaehong Yoon
Sung Ju Hwang
Yu Cao
CLL
86
5
0
21 Jun 2023
DropCompute: simple and more robust distributed synchronous training via
  compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
82
3
0
18 Jun 2023
When and Why Momentum Accelerates SGD:An Empirical Study
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
64
10
0
15 Jun 2023
Evaluation and Optimization of Gradient Compression for Distributed Deep
  Learning
Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
Lin Zhang
Longteng Zhang
Shaoshuai Shi
Xiaowen Chu
Yue Liu
OffRL
45
7
0
15 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
79
4
0
14 Jun 2023
Batches Stabilize the Minimum Norm Risk in High Dimensional
  Overparameterized Linear Regression
Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
Shahar Stein Ioushua
Inbar Hasidim
O. Shayevitz
M. Feder
61
0
0
14 Jun 2023
$\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in
  Decentralized Deep Learning
A2CiD2\textbf{A}^2\textbf{CiD}^2A2CiD2: Accelerating Asynchronous Communication in Decentralized Deep Learning
Adel Nabli
Eugene Belilovsky
Edouard Oyallon
74
7
0
14 Jun 2023
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous
  Updates
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates
Guojun Xiong
Gang Yan
Shiqiang Wang
Jian Li
101
4
0
11 Jun 2023
FLSL: Feature-level Self-supervised Learning
FLSL: Feature-level Self-supervised Learning
Qing Su
Anton Netchaev
Hai Helen Li
Shihao Ji
117
5
0
09 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on
  generalization through feature learning
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
124
15
0
07 Jun 2023
Quasi-Newton Updating for Large-Scale Distributed Learning
Quasi-Newton Updating for Large-Scale Distributed Learning
Shuyuan Wu
Danyang Huang
Hansheng Wang
68
6
0
07 Jun 2023
Revisiting Conversation Discourse for Dialogue Disentanglement
Revisiting Conversation Discourse for Dialogue Disentanglement
Bobo Li
Hao Fei
Fei Li
Shengqiong Wu
Lizi Liao
Yin-wei Wei
Tat-Seng Chua
Donghong Ji
57
1
0
06 Jun 2023
A Scalable and Adaptive System to Infer the Industry Sectors of
  Companies: Prompt + Model Tuning of Generative Language Models
A Scalable and Adaptive System to Infer the Industry Sectors of Companies: Prompt + Model Tuning of Generative Language Models
Le-le Cao
Vilhelm von Ehrenheim
Astrid Berghult
Cecilia Henje
Richard Anselmo Stahl
Joar Wandborg
S. Stan
Armin Catovic
Erik Ferm
Hannes Ingelhag
68
4
0
05 Jun 2023
Early Weight Averaging meets High Learning Rates for LLM Pre-training
Early Weight Averaging meets High Learning Rates for LLM Pre-training
Sunny Sanyal
A. Neerkaje
Jean Kaddour
Abhishek Kumar
Sujay Sanghavi
MoMe
102
19
0
05 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically
  Equivalent
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
156
15
0
05 Jun 2023
Enhance Diffusion to Improve Robust Generalization
Enhance Diffusion to Improve Robust Generalization
Jianhui Sun
Sanchit Sinha
Aidong Zhang
77
4
0
05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between
  Creativity and Reality in Large Language Models
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Dinesh Manocha
102
25
0
04 Jun 2023
Masked Autoencoder for Unsupervised Video Summarization
Masked Autoencoder for Unsupervised Video Summarization
Minho Shim
Taeoh Kim
Jinhyung Kim
Dongyoon Wee
49
1
0
02 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
128
190
0
01 Jun 2023
Thought Cloning: Learning to Think while Acting by Imitating Human
  Thinking
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Shengran Hu
Jeff Clune
LM&RoOffRLLRMAI4CE
89
29
0
01 Jun 2023
Self-supervised Vision Transformers for 3D Pose Estimation of Novel
  Objects
Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects
S. Thalhammer
Jean-Baptiste Weibel
Markus Vincze
Jose Garcia-Rodriguez
ViT
93
10
0
31 May 2023
Improving CLIP Training with Language Rewrites
Improving CLIP Training with Language Rewrites
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDLVLMCLIP
114
177
0
31 May 2023
On Convergence of Incremental Gradient for Non-Convex Smooth Functions
On Convergence of Incremental Gradient for Non-Convex Smooth Functions
Anastasia Koloskova
N. Doikov
Sebastian U. Stich
Martin Jaggi
77
3
0
30 May 2023
An AMR-based Link Prediction Approach for Document-level Event Argument
  Extraction
An AMR-based Link Prediction Approach for Document-level Event Argument Extraction
Yuqing Yang
Qipeng Guo
Xiangkun Hu
Yue Zhang
Xipeng Qiu
Zheng Zhang
386
27
0
30 May 2023
Stochastic Gradient Langevin Dynamics Based on Quantization with
  Increasing Resolution
Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution
Jinwuk Seok
Chang-Jae Cho
50
0
0
30 May 2023
Identifying Spurious Biases Early in Training through the Lens of
  Simplicity Bias
Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
Yu Yang
Eric Gan
Gintare Karolina Dziugaite
Baharan Mirzasoleiman
81
28
0
30 May 2023
Global-QSGD: Practical Floatless Quantization for Distributed Learning
  with Theoretical Guarantees
Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees
Jihao Xin
Marco Canini
Peter Richtárik
Samuel Horváth
88
2
0
29 May 2023
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
123
29
0
29 May 2023
Rethinking PRL: A Multiscale Progressively Residual Learning Network for
  Inverse Halftoning
Rethinking PRL: A Multiscale Progressively Residual Learning Network for Inverse Halftoning
Feiyu Li
Jun Yang
36
1
0
27 May 2023
Fine-Tuning Language Models with Just Forward Passes
Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
160
205
0
27 May 2023
Detecting Heart Disease from Multi-View Ultrasound Images via Supervised
  Attention Multiple Instance Learning
Detecting Heart Disease from Multi-View Ultrasound Images via Supervised Attention Multiple Instance Learning
Zhe Huang
B. Wessler
M. C. Hughes
108
4
0
25 May 2023
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of
  Language Model
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Zirui Liu
Guanchu Wang
Shaochen Zhong
Zhaozhuo Xu
Daochen Zha
...
Zhimeng Jiang
Kaixiong Zhou
Vipin Chaudhary
Shuai Xu
Helen Zhou
102
15
0
24 May 2023
Delving Deeper into Data Scaling in Masked Image Modeling
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng Lu
Xiaojie Jin
Qibin Hou
Jun Hao Liew
Mingg-Ming Cheng
Jiashi Feng
69
4
0
24 May 2023
Run Like a Girl! Sports-Related Gender Bias in Language and Vision
Run Like a Girl! Sports-Related Gender Bias in Language and Vision
S. Harrison
Eleonora Gualdoni
Gemma Boleda
47
6
0
23 May 2023
Siamese Masked Autoencoders
Siamese Masked Autoencoders
Agrim Gupta
Jiajun Wu
Jia Deng
Li Fei-Fei
88
55
0
23 May 2023
Disentangled Variational Autoencoder for Emotion Recognition in
  Conversations
Disentangled Variational Autoencoder for Emotion Recognition in Conversations
Kailai Yang
Tianlin Zhang
Sophia Ananiadou
DRL
95
11
0
23 May 2023
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained
  Vision-Language Model
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
CLIPVLM
129
27
0
23 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedMLAAML
98
0
0
23 May 2023
ADA-GP: Accelerating DNN Training By Adaptive Gradient Prediction
ADA-GP: Accelerating DNN Training By Adaptive Gradient Prediction
Vahid Janfaza
Shantanu Mandal
Farabi Mahmud
A. Muzahid
59
2
0
22 May 2023
TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object
  Detection Network for Low Power Microcontrollers
TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object Detection Network for Low Power Microcontrollers
Julian Moosmann
Marco Giordano
Christian Vogt
Michele Magno
MQObjD
60
20
0
22 May 2023
"What do others think?": Task-Oriented Conversational Modeling with
  Subjective Knowledge
"What do others think?": Task-Oriented Conversational Modeling with Subjective Knowledge
Chao Zhao
Spandana Gella
Seokhwan Kim
Di Jin
Devamanyu Hazarika
Alexandros Papangelis
Behnam Hedayatnia
Mahdi Namazifar
Yang Liu
Dilek Z. Hakkani-Tür
94
7
0
20 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
  Stability
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
65
21
0
19 May 2023
Neural Foundations of Mental Simulation: Future Prediction of Latent
  Representations on Dynamic Scenes
Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
Aran Nayebi
R. Rajalingham
M. Jazayeri
G. R. Yang
76
20
0
19 May 2023
Reciprocal Attention Mixing Transformer for Lightweight Image
  Restoration
Reciprocal Attention Mixing Transformer for Lightweight Image Restoration
Haram Choi
Cheolwoong Na
Jihyeon Oh
Seungjae Lee
Jinseop S. Kim
Subeen Choe
Jeongmin Lee
Taehoon Kim
Jihoon Yang
92
9
0
19 May 2023
Simplifying Distributed Neural Network Training on Massive Graphs:
  Randomized Partitions Improve Model Aggregation
Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation
Jiong Zhu
Aishwarya N. Reganti
E-Wen Huang
Charles Dickens
Nikhil S. Rao
Karthik Subbian
Danai Koutra
GNNFedML
80
3
0
17 May 2023
LoViT: Long Video Transformer for Surgical Phase Recognition
LoViT: Long Video Transformer for Surgical Phase Recognition
Yang Liu
Maxence Boels
Luis C. Garcia-Peraza-Herrera
Tom Vercauteren
P. Dasgupta
Alejandro Granados
Sebastien Ourselin
127
35
0
15 May 2023
Previous
123...789...404142
Next