ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.00962
  4. Cited By
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
v1v2v3v4v5 (latest)

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
    ODL
ArXiv (abs)PDFHTMLGithub (1698★)

Papers citing "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"

11 / 611 papers shown
Title
AGATHA: Automatic Graph-mining And Transformer based Hypothesis
  generation Approach
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Justin Sybrandt
Ilya Tyagin
M. Shtutman
Ilya Safro
54
37
0
13 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
96
52
0
10 Feb 2020
On the distance between two neural networks and the stability of
  learning
On the distance between two neural networks and the stability of learning
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Xuan Li
ODL
284
59
0
09 Feb 2020
Multilingual is not enough: BERT for Finnish
Multilingual is not enough: BERT for Finnish
Antti Virtanen
Jenna Kanerva
Rami Ilo
Jouni Luoma
Juhani Luotolahti
T. Salakoski
Filip Ginter
S. Pyysalo
95
280
0
15 Dec 2019
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with
  Adaptive Learning Rates
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates
Cong Xie
Oluwasanmi Koyejo
Indranil Gupta
Yanghua Peng
73
42
0
20 Nov 2019
On the Cross-lingual Transferability of Monolingual Representations
On the Cross-lingual Transferability of Monolingual Representations
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
289
801
0
25 Oct 2019
Training Kinetics in 15 Minutes: Large-scale Distributed Training on
  Videos
Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos
Ji Lin
Chuang Gan
Song Han
78
10
0
01 Oct 2019
Extremely Small BERT Models from Mixed-Vocabulary Training
Extremely Small BERT Models from Mixed-Vocabulary Training
Sanqiang Zhao
Raghav Gupta
Yang Song
Denny Zhou
VLM
76
53
0
25 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
406
1,926
0
17 Sep 2019
Distributed Equivalent Substitution Training for Large-Scale Recommender
  Systems
Distributed Equivalent Substitution Training for Large-Scale Recommender Systems
Haidong Rong
Yangzihao Wang
Feihu Zhou
Junjie Zhai
Haiyang Wu
...
Fan Li
Han Zhang
Yuekui Yang
Zhenyu Guo
Di Wang
OffRL
51
11
0
10 Sep 2019
Taming Momentum in a Distributed Asynchronous Environment
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
93
23
0
26 Jul 2019
Previous
123...111213