Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.04473
Cited By
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
9 April 2021
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
V. Korthikanti
Dmitri Vainbrand
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM"
16 / 366 papers shown
Title
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
Shenggan Cheng
Xuanlei Zhao
Guangyang Lu
Bin-Rui Li
Zhongming Yu
Tian Zheng
R. Wu
Xiwen Zhang
Jian Peng
Yang You
AI4CE
24
30
0
02 Mar 2022
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan V. Oseledets
Olivier Beaumont
22
10
0
21 Feb 2022
XAI for Transformers: Better Explanations through Conservative Propagation
Ameen Ali
Thomas Schnake
Oliver Eberle
G. Montavon
Klaus-Robert Muller
Lior Wolf
FAtt
15
89
0
15 Feb 2022
Integrating AI Planning with Natural Language Processing: A Combination of Explicit and Tacit Knowledge
Kebing Jin
H. Zhuo
30
5
0
15 Feb 2022
Efficient Direct-Connect Topologies for Collective Communications
Liangyu Zhao
Siddharth Pal
Tapan Chugh
Weiyang Wang
Jason Fantl
P. Basu
J. Khoury
Arvind Krishnamurthy
22
6
0
07 Feb 2022
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Lianmin Zheng
Zhuohan Li
Hao Zhang
Yonghao Zhuang
Zhifeng Chen
...
Yuanzhong Xu
Danyang Zhuo
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
MoE
27
104
0
28 Jan 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
90
730
0
28 Jan 2022
End-to-end Adaptive Distributed Training on PaddlePaddle
Yulong Ao
Zhihua Wu
Dianhai Yu
Weibao Gong
Zhiqing Kui
...
Yanjun Ma
Tian Wu
Haifeng Wang
Wei Zeng
Chao Yang
19
10
0
06 Dec 2021
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
30
143
0
28 Oct 2021
Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning
Ningning Xie
Tamara Norman
Dominik Grewe
Dimitrios Vytiniotis
11
16
0
20 Oct 2021
DEMix Layers: Disentangling Domains for Modular Language Modeling
Suchin Gururangan
Michael Lewis
Ari Holtzman
Noah A. Smith
Luke Zettlemoyer
KELM
MoE
13
127
0
11 Aug 2021
Distributed Deep Learning in Open Collaborations
Michael Diskin
Alexey Bukhtiyarov
Max Ryabinin
Lucile Saulnier
Quentin Lhoest
...
Denis Mazur
Ilia Kobelev
Yacine Jernite
Thomas Wolf
Gennady Pekhimenko
FedML
38
54
0
18 Jun 2021
Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads
Abhinav Jangda
Jun Huang
Guodong Liu
Amir Hossein Nodehi Sabet
Saeed Maleki
Youshan Miao
Madan Musuvathi
Todd Mytkowicz
Olli Saarikivi
17
59
0
12 May 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
414
0
18 Jan 2021
Srifty: Swift and Thrifty Distributed Training on the Cloud
Liangchen Luo
Peter West
Arvind Krishnamurthy
Luis Ceze
22
11
0
29 Nov 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
Previous
1
2
3
4
5
6
7
8