Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.04473
Cited By
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
9 April 2021
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
V. Korthikanti
Dmitri Vainbrand
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM"
50 / 366 papers shown
Title
DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining
Lin Zhang
S. Shi
X. Chu
Wei Wang
Bo-wen Li
Chengjian Liu
11
11
0
24 Feb 2023
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Zhuohan Li
Lianmin Zheng
Yinmin Zhong
Vincent Liu
Ying Sheng
...
Yanping Huang
Zhifeng Chen
Hao Zhang
Joseph E. Gonzalez
Ion Stoica
MoE
13
68
0
22 Feb 2023
A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and Techniques
Wenbin Li
Hakim Hacid
Ebtesam Almazrouei
Merouane Debbah
34
13
0
16 Feb 2023
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
Minghao Li
Ran Ben-Basat
S. Vargaftik
Chon-In Lao
Ke Xu
Michael Mitzenmacher
Minlan Yu Harvard University
26
15
0
16 Feb 2023
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
Shiwei Zhang
Lansong Diao
Siyu Wang
Zongyan Cao
Yiliang Gu
Chang Si
Ziji Shi
Zhen Zheng
Chuan Wu
W. Lin
AI4CE
22
4
0
16 Feb 2023
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen
Cody Hao Yu
Shuai Zheng
Zhen Zhang
Zhiru Zhang
Yida Wang
22
6
0
16 Feb 2023
SWIFT: Expedited Failure Recovery for Large-scale DNN Training
Keon Jang
Hassan M. G. Wassel
Behnam Montazeri
Michael Ryan
David Wetherall
17
8
0
13 Feb 2023
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation
Ziji Shi
Le Jiang
Ang Wang
Jie Zhang
Xianyan Jia
Yong Li
Chencan Wu
Jialin Li
Wei Lin
GNN
44
2
0
01 Feb 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
19
5
0
06 Jan 2023
Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Kelly Marchisio
Patrick Lewis
Yihong Chen
Mikel Artetxe
32
16
0
20 Dec 2022
Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking
Mingyu Lee
Jun-Hyung Park
Junho Kim
Kang-Min Kim
SangKeun Lee
10
12
0
15 Dec 2022
Elixir: Train a Large Language Model on a Small GPU Cluster
Haichen Huang
Jiarui Fang
Hongxin Liu
Shenggui Li
Yang You
VLM
16
7
0
10 Dec 2022
An Efficient Split Fine-tuning Framework for Edge and Cloud Collaborative Learning
S. Shi
Qing Yang
Yang Xiang
Shuhan Qi
Xinyu Wang
15
1
0
30 Nov 2022
COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
D. Kadiyala
Saeed Rashidi
Taekyung Heo
Abhimanyu Bambhaniya
T. Krishna
Alexandros Daglis
VLM
24
9
0
30 Nov 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
14
102
0
29 Nov 2022
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
35
24
0
25 Nov 2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
GNN
MoE
37
60
0
25 Nov 2022
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
28
1
0
11 Nov 2022
Prompt Learning for Domain Adaptation in Task-Oriented Dialogue
Makesh Narsimhan Sreedhar
Christopher Parisien
13
3
0
10 Nov 2022
On Optimizing the Communication of Model Parallelism
Yonghao Zhuang
Hexu Zhao
Lianmin Zheng
Zhuohan Li
Eric P. Xing
Qirong Ho
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
22
24
0
10 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
116
2,310
0
09 Nov 2022
RCD-SGD: Resource-Constrained Distributed SGD in Heterogeneous Environment via Submodular Partitioning
Haoze He
Parijat Dube
15
1
0
02 Nov 2022
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
230
103
0
27 Oct 2022
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
Dacheng Li
Hongyi Wang
Eric P. Xing
Haotong Zhang
MoE
19
20
0
13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
31
47
0
13 Oct 2022
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
S. Kwon
Jeonghoon Kim
Jeongin Bae
Kang Min Yoo
Jin-Hwa Kim
Baeseong Park
Byeongwook Kim
Jung-Woo Ha
Nako Sung
Dongsoo Lee
MQ
29
30
0
08 Oct 2022
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering
Shamane Siriwardhana
Rivindu Weerasekera
Elliott Wen
Tharindu Kaluarachchi
R. Rana
Suranga Nanayakkara
VLM
14
160
0
06 Oct 2022
WeLM: A Well-Read Pre-trained Language Model for Chinese
Hui Su
Xiao Zhou
Houjin Yu
Xiaoyu Shen
Yuwen Chen
Zilin Zhu
Yang Yu
Jie Zhou
34
23
0
21 Sep 2022
Retrieval-based Controllable Molecule Generation
Zichao Wang
Weili Nie
Zhuoran Qiao
Chaowei Xiao
Richard Baraniuk
Anima Anandkumar
24
36
0
23 Aug 2022
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMe
AI4CE
LRM
18
3
0
25 Jul 2022
MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant Systems for Machine Learning
Baolin Li
Tirthak Patel
S. Samsi
V. Gadepally
Devesh Tiwari
12
51
0
23 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
27
58
0
05 Jul 2022
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Reza Yazdani Aminabadi
Samyam Rajbhandari
Minjia Zhang
A. A. Awan
Cheng-rong Li
...
Elton Zheng
Jeff Rasley
Shaden Smith
Olatunji Ruwase
Yuxiong He
31
335
0
30 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
107
1,062
0
22 Jun 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
18
73
0
20 Jun 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
27
39
0
10 Jun 2022
Visually-Augmented Language Modeling
Weizhi Wang
Li Dong
Hao Cheng
Haoyu Song
Xiaodong Liu
Xifeng Yan
Jianfeng Gao
Furu Wei
VLM
33
18
0
20 May 2022
PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking
Yixuan Qiao
Hao Chen
Jun Wang
Yongquan Lai
Tuozhen Liu
...
Xin Tang
Rui Fang
Peng Gao
Wenfeng Xie
Guotong Xie
19
1
0
18 May 2022
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
27
256
0
10 May 2022
A Survey on AI Sustainability: Emerging Trends on Learning Algorithms and Research Challenges
Zhenghua Chen
Min-man Wu
Alvin Chan
Xiaoli Li
Yew-Soon Ong
19
6
0
08 May 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
16
39
0
30 Apr 2022
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models
Yunzhuo Liu
Bo Jiang
Tian Guo
Zimeng Huang
Wen-ping Ma
Xinbing Wang
Chenghu Zhou
21
9
0
28 Apr 2022
Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs
John Thorpe
Pengzhan Zhao
Jon Eyolfson
Yifan Qiao
Zhihao Jia
Minjia Zhang
Ravi Netravali
Guoqing Harry Xu
21
56
0
26 Apr 2022
Enabling All In-Edge Deep Learning: A Literature Review
Praveen Joshi
Mohammed Hasanuzzaman
Chandra Thapa
Haithem Afli
T. Scully
31
22
0
07 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
94
6,015
0
05 Apr 2022
HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System
Xiaonan Nie
Pinxue Zhao
Xupeng Miao
Tong Zhao
Bin Cui
MoE
21
36
0
28 Mar 2022
Pathways: Asynchronous Distributed Dataflow for ML
P. Barham
Aakanksha Chowdhery
J. Dean
Sanjay Ghemawat
Steven Hand
...
Parker Schuh
Ryan Sepassi
Laurent El Shafey
C. A. Thekkath
Yonghui Wu
GNN
MoE
45
126
0
23 Mar 2022
Deep Lexical Hypothesis: Identifying personality structure in natural language
A. Cutler
D. Condon
19
30
0
04 Mar 2022
Previous
1
2
3
4
5
6
7
8
Next