Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.00962
Cited By
v1
v2
v3
v4
v5 (latest)
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1698★)
Papers citing
"Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"
50 / 611 papers shown
Title
Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation
Xin Yuan
Pedro H. P. Savarese
Michael Maire
74
5
0
22 Jun 2023
Semi-Supervised Learning for hyperspectral images by non parametrically predicting view assignment
Shivam Pande
Nassim Ait Ali Braham
Yi Wang
C. Albrecht
Biplab Banerjee
Xiao Xiang Zhu
SSL
50
0
0
19 Jun 2023
Towards Stability of Autoregressive Neural Operators
Michael McCabe
P. Harrington
Shashank Subramanian
Jed Brown
AI4CE
157
20
0
18 Jun 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
101
3
0
18 Jun 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
64
0
0
15 Jun 2023
Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
Lin Zhang
Longteng Zhang
Shaoshuai Shi
Xiaowen Chu
Yue Liu
OffRL
52
7
0
15 Jun 2023
Self-Supervised Polyp Re-Identification in Colonoscopy
Yotam Intrator
N. Aizenberg
Amir Livne
Ehud Rivlin
Roman Goldenberg
69
6
0
14 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
79
4
0
14 Jun 2023
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Fuxiao Liu
Hao Tan
Chris Tensmeyer
CLIP
VLM
103
18
0
09 Jun 2023
FasterViT: Fast Vision Transformers with Hierarchical Attention
Ali Hatamizadeh
Greg Heinrich
Hongxu Yin
Andrew Tao
J. Álvarez
Jan Kautz
Pavlo Molchanov
ViT
122
72
0
09 Jun 2023
KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow
Zhiyao Li
Mingyu Gao
49
1
0
09 Jun 2023
ScoreCL: Augmentation-Adaptive Contrastive Learning via Score-Matching Function
Jinyoung Kim
Soonwoo Kwon
Hyojun Go
Yunsung Lee
Seungtaek Choi
Hyun-Gyoon Kim
87
1
0
07 Jun 2023
BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs
Zhiyong Yang
Tinglin Huang
Ming Ding
Yuxiao Dong
Rex Ying
Yukuo Cen
Yangli-ao Geng
Jie Tang
SSL
VLM
91
9
0
06 Jun 2023
How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
Alexander Isenko
R. Mayer
Hans-Arno Jacobsen
86
8
0
05 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
162
15
0
05 Jun 2023
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation
Yunhe Gao
Zhuowei Li
Di Liu
Mu Zhou
Shaoting Zhang
Dimitris N. Metaxas
MedIm
100
13
0
04 Jun 2023
MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates
Mohammad Mozaffari
Sikan Li
Zhao Zhang
M. Dehnavi
76
4
0
02 Jun 2023
Data-Efficient French Language Modeling with CamemBERTa
Wissam Antoun
Benoît Sagot
Djamé Seddah
67
7
0
02 Jun 2023
On the Weight Dynamics of Deep Normalized Networks
Christian H. X. Ali Mehmeti-Göpel
Michael Wand
104
2
0
01 Jun 2023
Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects
S. Thalhammer
Jean-Baptiste Weibel
Markus Vincze
Jose Garcia-Rodriguez
ViT
109
10
0
31 May 2023
Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
167
205
0
27 May 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson
Bettina Messmer
Martin Jaggi
91
18
0
26 May 2023
Future-conditioned Unsupervised Pretraining for Decision Transformer
Zhihui Xie
Zichuan Lin
Deheng Ye
Qiang Fu
Wei Yang
Shuai Li
OffRL
OnRL
92
23
0
26 May 2023
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
70
0
0
25 May 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
75
10
0
24 May 2023
Beyond Individual Input for Deep Anomaly Detection on Tabular Data
Hugo Thimonier
Fabrice Popineau
Arpad Rimmel
Bich-Liên Doan
87
6
0
24 May 2023
BinaryViT: Towards Efficient and Accurate Binary Vision Transformers
Junrui Xiao
Zhikai Li
Lianwei Yang
Qingyi Gu
MQ
ViT
112
2
0
24 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
144
149
0
23 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
103
0
0
23 May 2023
VanillaNet: the Power of Minimalism in Deep Learning
Hanting Chen
Yunhe Wang
Jianyuan Guo
Dacheng Tao
VLM
90
96
0
22 May 2023
Bi-ViT: Pushing the Limit of Vision Transformer Quantization
Yanjing Li
Sheng Xu
Mingbao Lin
Xianbin Cao
Chuanjian Liu
Xiao Sun
Baochang Zhang
ViT
MQ
97
11
0
21 May 2023
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Denis Tarasov
Vladislav Kurenkov
Alexander Nikulin
Sergey Kolesnikov
OffRL
103
51
0
16 May 2023
What is the best recipe for character-level encoder-only modelling?
Kris Cao
66
3
0
09 May 2023
Predicting COVID-19 and pneumonia complications from admission texts
D. Umerenkov
O. Cherkashin
Alexander Nesterov
Victor A. Gombolevskiy
I. Demko
Alexander Yalunin
V. Kokh
25
0
0
05 May 2023
Stimulative Training++: Go Beyond The Performance Limits of Residual Networks
XinYu Piao
Tong He
DoangJoo Synn
Baopu Li
Tao Chen
Lei Bai
Jong-Kook Kim
103
4
0
04 May 2023
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Frederik Kunstner
Jacques Chen
J. Lavington
Mark Schmidt
100
75
0
27 Apr 2023
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations
Inyoung Paik
Jaesik Choi
81
1
0
23 Apr 2023
DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
Yihao Chen
Xianbiao Qi
Jianan Wang
Lei Zhang
82
18
0
17 Apr 2023
Data-Efficient Image Quality Assessment with Attention-Panel Decoder
Guanyi Qin
R. Hu
Yutao Liu
Xiawu Zheng
Haotian Liu
Xiu Li
Yan Zhang
ViT
75
68
0
11 Apr 2023
An autoencoder compression approach for accelerating large-scale inverse problems
J. Wittmer
Jacob Badger
H. Sundar
T. Bui-Thanh
AI4CE
65
1
0
10 Apr 2023
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
Kfir Y. Levy
Kfir Y. Levy
FedML
110
3
0
09 Apr 2023
Can we learn better with hard samples?
Subin Sahayam
John Zakkam
Umarani Jayaraman
80
2
0
07 Apr 2023
Tag that issue: Applying API-domain labels in issue tracking systems
Fabio Santos
Joseph Vargovich
Bianca Trinkenreich
Í. Santos
Jacob Penney
...
João Felipe Pimentel
I. Wiese
Igor Steinmacher
A. Sarma
M. Gerosa
49
5
0
06 Apr 2023
The Stable Signature: Rooting Watermarks in Latent Diffusion Models
Pierre Fernandez
Guillaume Couairon
Hervé Jégou
Matthijs Douze
Teddy Furon
WIGM
135
198
0
27 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
160
513
0
27 Mar 2023
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture
Peiyu Liu
Ze-Feng Gao
Yushuo Chen
Wayne Xin Zhao
Ji-Rong Wen
MoE
74
0
0
27 Mar 2023
Mathematical Challenges in Deep Learning
V. Nia
Guojun Zhang
I. Kobyzev
Michael R. Metel
Xinlin Li
...
S. Hemati
M. Asgharian
Linglong Kong
Wulong Liu
Boxing Chen
AI4CE
VLM
72
1
0
24 Mar 2023
GiveMeLabeledIssues: An Open Source Issue Recommendation System
Joseph Vargovich
Fabio Santos
Jacob Penney
M. Gerosa
Igor Steinmacher
VLM
54
10
0
23 Mar 2023
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Sixun Dong
Huazhang Hu
Dongze Lian
Weixin Luo
Yichen Qian
Shenghua Gao
ViT
AI4TS
73
12
0
22 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
148
289
0
20 Mar 2023
Previous
1
2
3
4
5
6
...
11
12
13
Next