Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.00962
Cited By
v1
v2
v3
v4
v5 (latest)
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1698★)
Papers citing
"Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"
11 / 611 papers shown
Title
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Justin Sybrandt
Ilya Tyagin
M. Shtutman
Ilya Safro
54
37
0
13 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
96
52
0
10 Feb 2020
On the distance between two neural networks and the stability of learning
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Xuan Li
ODL
284
59
0
09 Feb 2020
Multilingual is not enough: BERT for Finnish
Antti Virtanen
Jenna Kanerva
Rami Ilo
Jouni Luoma
Juhani Luotolahti
T. Salakoski
Filip Ginter
S. Pyysalo
95
280
0
15 Dec 2019
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates
Cong Xie
Oluwasanmi Koyejo
Indranil Gupta
Yanghua Peng
73
42
0
20 Nov 2019
On the Cross-lingual Transferability of Monolingual Representations
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
289
801
0
25 Oct 2019
Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos
Ji Lin
Chuang Gan
Song Han
78
10
0
01 Oct 2019
Extremely Small BERT Models from Mixed-Vocabulary Training
Sanqiang Zhao
Raghav Gupta
Yang Song
Denny Zhou
VLM
76
53
0
25 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
406
1,926
0
17 Sep 2019
Distributed Equivalent Substitution Training for Large-Scale Recommender Systems
Haidong Rong
Yangzihao Wang
Feihu Zhou
Junjie Zhai
Haiyang Wu
...
Fan Li
Han Zhang
Yuekui Yang
Zhenyu Guo
Di Wang
OffRL
51
11
0
10 Sep 2019
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
93
23
0
26 Jul 2019
Previous
1
2
3
...
11
12
13