Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.02677
Cited By
v1
v2 (latest)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
50 / 2,054 papers shown
Title
A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery
Xin Zhang
Liangxiu Han
SSL
92
3
0
27 Jun 2023
A Positive-Unlabeled Metric Learning Framework for Document-Level Relation Extraction with Incomplete Labeling
Ye Wang
Huazheng Pan
Tao Zhang
Wen Wu
Wen-zhong Hu
87
5
0
26 Jun 2023
Scaling MLPs: A Tale of Inductive Bias
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
110
38
0
23 Jun 2023
FFCV: Accelerating Training by Removing Data Bottlenecks
Guillaume Leclerc
Andrew Ilyas
Logan Engstrom
Sung Min Park
Hadi Salman
Aleksander Madry
66
70
0
21 Jun 2023
Continual Learners are Incremental Model Generalizers
Jaehong Yoon
Sung Ju Hwang
Yu Cao
CLL
86
5
0
21 Jun 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
82
3
0
18 Jun 2023
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
64
10
0
15 Jun 2023
Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
Lin Zhang
Longteng Zhang
Shaoshuai Shi
Xiaowen Chu
Yue Liu
OffRL
45
7
0
15 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
79
4
0
14 Jun 2023
Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
Shahar Stein Ioushua
Inbar Hasidim
O. Shayevitz
M. Feder
61
0
0
14 Jun 2023
A
2
CiD
2
\textbf{A}^2\textbf{CiD}^2
A
2
CiD
2
: Accelerating Asynchronous Communication in Decentralized Deep Learning
Adel Nabli
Eugene Belilovsky
Edouard Oyallon
74
7
0
14 Jun 2023
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates
Guojun Xiong
Gang Yan
Shiqiang Wang
Jian Li
101
4
0
11 Jun 2023
FLSL: Feature-level Self-supervised Learning
Qing Su
Anton Netchaev
Hai Helen Li
Shihao Ji
117
5
0
09 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
124
15
0
07 Jun 2023
Quasi-Newton Updating for Large-Scale Distributed Learning
Shuyuan Wu
Danyang Huang
Hansheng Wang
68
6
0
07 Jun 2023
Revisiting Conversation Discourse for Dialogue Disentanglement
Bobo Li
Hao Fei
Fei Li
Shengqiong Wu
Lizi Liao
Yin-wei Wei
Tat-Seng Chua
Donghong Ji
57
1
0
06 Jun 2023
A Scalable and Adaptive System to Infer the Industry Sectors of Companies: Prompt + Model Tuning of Generative Language Models
Le-le Cao
Vilhelm von Ehrenheim
Astrid Berghult
Cecilia Henje
Richard Anselmo Stahl
Joar Wandborg
S. Stan
Armin Catovic
Erik Ferm
Hannes Ingelhag
68
4
0
05 Jun 2023
Early Weight Averaging meets High Learning Rates for LLM Pre-training
Sunny Sanyal
A. Neerkaje
Jean Kaddour
Abhishek Kumar
Sujay Sanghavi
MoMe
102
19
0
05 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
156
15
0
05 Jun 2023
Enhance Diffusion to Improve Robust Generalization
Jianhui Sun
Sanchit Sinha
Aidong Zhang
77
4
0
05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Dinesh Manocha
102
25
0
04 Jun 2023
Masked Autoencoder for Unsupervised Video Summarization
Minho Shim
Taeoh Kim
Jinhyung Kim
Dongyoon Wee
49
1
0
02 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
128
190
0
01 Jun 2023
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Shengran Hu
Jeff Clune
LM&Ro
OffRL
LRM
AI4CE
89
29
0
01 Jun 2023
Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects
S. Thalhammer
Jean-Baptiste Weibel
Markus Vincze
Jose Garcia-Rodriguez
ViT
93
10
0
31 May 2023
Improving CLIP Training with Language Rewrites
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDL
VLM
CLIP
114
177
0
31 May 2023
On Convergence of Incremental Gradient for Non-Convex Smooth Functions
Anastasia Koloskova
N. Doikov
Sebastian U. Stich
Martin Jaggi
77
3
0
30 May 2023
An AMR-based Link Prediction Approach for Document-level Event Argument Extraction
Yuqing Yang
Qipeng Guo
Xiangkun Hu
Yue Zhang
Xipeng Qiu
Zheng Zhang
386
27
0
30 May 2023
Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution
Jinwuk Seok
Chang-Jae Cho
50
0
0
30 May 2023
Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
Yu Yang
Eric Gan
Gintare Karolina Dziugaite
Baharan Mirzasoleiman
81
28
0
30 May 2023
Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees
Jihao Xin
Marco Canini
Peter Richtárik
Samuel Horváth
88
2
0
29 May 2023
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
123
29
0
29 May 2023
Rethinking PRL: A Multiscale Progressively Residual Learning Network for Inverse Halftoning
Feiyu Li
Jun Yang
36
1
0
27 May 2023
Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
160
205
0
27 May 2023
Detecting Heart Disease from Multi-View Ultrasound Images via Supervised Attention Multiple Instance Learning
Zhe Huang
B. Wessler
M. C. Hughes
108
4
0
25 May 2023
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Zirui Liu
Guanchu Wang
Shaochen Zhong
Zhaozhuo Xu
Daochen Zha
...
Zhimeng Jiang
Kaixiong Zhou
Vipin Chaudhary
Shuai Xu
Helen Zhou
102
15
0
24 May 2023
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng Lu
Xiaojie Jin
Qibin Hou
Jun Hao Liew
Mingg-Ming Cheng
Jiashi Feng
69
4
0
24 May 2023
Run Like a Girl! Sports-Related Gender Bias in Language and Vision
S. Harrison
Eleonora Gualdoni
Gemma Boleda
47
6
0
23 May 2023
Siamese Masked Autoencoders
Agrim Gupta
Jiajun Wu
Jia Deng
Li Fei-Fei
88
55
0
23 May 2023
Disentangled Variational Autoencoder for Emotion Recognition in Conversations
Kailai Yang
Tianlin Zhang
Sophia Ananiadou
DRL
95
11
0
23 May 2023
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
CLIP
VLM
129
27
0
23 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
98
0
0
23 May 2023
ADA-GP: Accelerating DNN Training By Adaptive Gradient Prediction
Vahid Janfaza
Shantanu Mandal
Farabi Mahmud
A. Muzahid
59
2
0
22 May 2023
TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object Detection Network for Low Power Microcontrollers
Julian Moosmann
Marco Giordano
Christian Vogt
Michele Magno
MQ
ObjD
60
20
0
22 May 2023
"What do others think?": Task-Oriented Conversational Modeling with Subjective Knowledge
Chao Zhao
Spandana Gella
Seokhwan Kim
Di Jin
Devamanyu Hazarika
Alexandros Papangelis
Behnam Hedayatnia
Mahdi Namazifar
Yang Liu
Dilek Z. Hakkani-Tür
94
7
0
20 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
65
21
0
19 May 2023
Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
Aran Nayebi
R. Rajalingham
M. Jazayeri
G. R. Yang
76
20
0
19 May 2023
Reciprocal Attention Mixing Transformer for Lightweight Image Restoration
Haram Choi
Cheolwoong Na
Jihyeon Oh
Seungjae Lee
Jinseop S. Kim
Subeen Choe
Jeongmin Lee
Taehoon Kim
Jihoon Yang
92
9
0
19 May 2023
Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation
Jiong Zhu
Aishwarya N. Reganti
E-Wen Huang
Charles Dickens
Nikhil S. Rao
Karthik Subbian
Danai Koutra
GNN
FedML
80
3
0
17 May 2023
LoViT: Long Video Transformer for Surgical Phase Recognition
Yang Liu
Maxence Boels
Luis C. Garcia-Peraza-Herrera
Tom Vercauteren
P. Dasgupta
Alejandro Granados
Sebastien Ourselin
127
35
0
15 May 2023
Previous
1
2
3
...
7
8
9
...
40
41
42
Next