Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.04473
Cited By
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
9 April 2021
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
V. Korthikanti
Dmitri Vainbrand
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM"
50 / 366 papers shown
Title
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Damai Dai
Chengqi Deng
Chenggang Zhao
R. X. Xu
Huazuo Gao
...
Panpan Huang
Fuli Luo
Chong Ruan
Zhifang Sui
W. Liang
MoE
38
248
0
11 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
306
0
05 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
32
5
0
05 Jan 2024
Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
Mincong Huang
Chao Wang
Chi Ma
Yineng Zhang
Peng Zhang
Lei Yu
25
1
0
04 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xintao Hu
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
35
65
0
04 Jan 2024
IoT in the Era of Generative AI: Vision and Challenges
Xin Wang
Zhongwei Wan
Arvin Hekmati
M. Zong
Samiul Alam
Mi Zhang
Bhaskar Krishnamachari
32
15
0
03 Jan 2024
Unicron: Economizing Self-Healing LLM Training at Scale
Tao He
Xue Li
Zhibin Wang
Kun Qian
Jingbo Xu
Wenyuan Yu
Jingren Zhou
19
15
0
30 Dec 2023
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
18
14
0
28 Dec 2023
Preference as Reward, Maximum Preference Optimization with Importance Sampling
Zaifan Jiang
Xing Huang
Chao Wei
28
2
0
27 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
53
34
0
23 Dec 2023
Optimizing Distributed Training on Frontier for Large Language Models
Sajal Dash
Isaac Lyngaas
Junqi Yin
Xiao Wang
Romain Egele
Guojing Cong
Feiyi Wang
Prasanna Balaprakash
ALM
MoE
81
13
0
20 Dec 2023
SPT: Fine-Tuning Transformer-based Language Models Efficiently with Sparsification
Yuntao Gui
Xiao Yan
Peiqi Yin
Han Yang
James Cheng
40
2
0
16 Dec 2023
TigerBot: An Open Multilingual Multitask LLM
Ye Chen
Wei Cai
Liangming Wu
Xiaowei Li
Zhanxuan Xin
Cong Fu
90
11
0
14 Dec 2023
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
31
7
0
14 Dec 2023
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
W. Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric P. Xing
49
70
0
11 Dec 2023
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections
Marcel Wagenlander
Guo Li
Bo Zhao
Luo Mai
Peter R. Pietzuch
35
7
0
08 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
31
0
08 Dec 2023
Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment
Fei Yang
Shuang Peng
Ning Sun
Fangyu Wang
Ke Tan
Fu Wu
Jiezhong Qiu
Aimin Pan
24
4
0
06 Dec 2023
FlexModel: A Framework for Interpretability of Distributed Large Language Models
Matthew Choi
Muhammad Adil Asif
John Willes
David Emerson
AI4CE
ALM
27
1
0
05 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
27
22
0
01 Dec 2023
Exploring the Robustness of Decentralized Training for Large Language Models
Lin Lu
Chenxi Dai
Wangcheng Tao
Binhang Yuan
Yanan Sun
Pan Zhou
31
1
0
01 Dec 2023
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
32
1
0
01 Dec 2023
Distributed Global Structure-from-Motion with a Deep Front-End
Ayush Baid
John Lambert
Travis Driver
Akshay Krishnan
H. Stepanyan
F. Dellaert
3DGS
29
3
0
30 Nov 2023
Zero Bubble Pipeline Parallelism
Penghui Qi
Xinyi Wan
Guangxing Huang
Min Lin
27
24
0
30 Nov 2023
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Zeming Chen
Alejandro Hernández Cano
Angelika Romanou
Antoine Bonnet
Kyle Matoba
...
Axel Marmet
Syrielle Montariol
Mary-Anne Hartley
Martin Jaggi
Antoine Bosselut
LM&MA
AI4MH
MedIm
35
179
0
27 Nov 2023
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Jehyeon Bang
Yujeong Choi
Myeongwoo Kim
Yongdeok Kim
Minsoo Rhu
32
15
0
27 Nov 2023
Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search
Zhiqi Lin
Youshan Miao
Guanbin Xu
Cheng Li
Olli Saarikivi
Saeed Maleki
Fan Yang
14
6
0
26 Nov 2023
Robot Learning in the Era of Foundation Models: A Survey
Xuan Xiao
Jiahang Liu
Zhipeng Wang
Yanmin Zhou
Yong Qi
Qian Cheng
Bin He
Shuo Jiang
AI4CE
LM&Ro
26
27
0
24 Nov 2023
nach0: Multimodal Natural and Chemical Languages Foundation Model
M. Livne
Z. Miftahutdinov
E. Tutubalina
Maksim Kuznetsov
Daniil Polykovskiy
...
Aastha Jhunjhunwala
Anthony Costa
Alex Aliper
Alán Aspuru-Guzik
Alex Zhavoronkov
AI4CE
27
12
0
21 Nov 2023
Zero redundancy distributed learning with differential privacy
Zhiqi Bu
Justin Chiu
Ruixuan Liu
Sheng Zha
George Karypis
45
8
0
20 Nov 2023
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
Youhe Jiang
Ran Yan
Xiaozhe Yao
Yang Zhou
Beidi Chen
Binhang Yuan
SyDa
27
10
0
20 Nov 2023
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
Chenyu Jiang
Zhen Jia
Shuai Zheng
Yida Wang
Chuan Wu
MoE
AI4CE
17
8
0
17 Nov 2023
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
Johannes Hagemann
Samuel Weinbach
Konstantin Dobler
Maximilian Schall
Gerard de Melo
LRM
37
6
0
09 Nov 2023
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M. Ibrahim
Shaizeen Aga
Ada Li
Suchita Pati
Mahzabeen Islam
21
3
0
08 Nov 2023
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Longteng Zhang
Xiang Liu
Zeyu Li
Xinglin Pan
Peijie Dong
...
Rui Guo
Xin Wang
Qiong Luo
S. Shi
Xiaowen Chu
43
7
0
07 Nov 2023
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Ying Sheng
Shiyi Cao
Dacheng Li
Coleman Hooper
Nicholas Lee
...
Banghua Zhu
Lianmin Zheng
Kurt Keutzer
Joseph E. Gonzalez
Ion Stoica
MoE
26
88
0
06 Nov 2023
RTP: Rethinking Tensor Parallelism with Memory Deduplication
Cheng Luo
Tianle Zhong
Geoffrey C. Fox
27
3
0
02 Nov 2023
AMSP: Reducing Communication Overhead of ZeRO for Efficient LLM Training
Qiaoling Chen
Qi Hu
Guoteng Wang
Zhisheng Ye
Ting Huang
...
Yang Gao
Hang Yan
Yonggang Wen
Tianwei Zhang
Peng Sun
37
6
0
01 Nov 2023
Recipes for calibration and validation of agent-based models in cancer biomedicine
Nicolò Cogno
Cristian Axenie
Roman Bauer
V. Vavourakis
AI4CE
13
0
0
30 Oct 2023
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
29
93
0
30 Oct 2023
Punica: Multi-Tenant LoRA Serving
Lequn Chen
Zihao Ye
Yongji Wu
Danyang Zhuo
Luis Ceze
Arvind Krishnamurthy
44
34
0
28 Oct 2023
The Impact of Performance Expectancy, Workload, Risk, and Satisfaction on Trust in ChatGPT: Cross-sectional Survey Analysis
Hamid Shamszare
Avishek Choudhury
49
12
0
20 Oct 2023
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models
Weize Chen
Xiaoyue Xu
Xu Han
Yankai Lin
Ruobing Xie
Zhiyuan Liu
Maosong Sun
Jie Zhou
37
0
0
19 Oct 2023
Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining
Yuxin Wang
S. Shi
Xin He
Zhenheng Tang
Xinglin Pan
Yang Zheng
Xiaoyu Wu
Amelie Chi Zhou
Bingsheng He
Xiaowen Chu
KELM
32
2
0
19 Oct 2023
TRANSOM: An Efficient Fault-Tolerant System for Training LLMs
Baodong Wu
Lei Xia
Qingping Li
Kangyu Li
Xu Chen
Yongqiang Guo
Tieyao Xiang
Yuheng Chen
Shigang Li
34
11
0
16 Oct 2023
Tokenizer Choice For LLM Training: Negligible or Crucial?
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Richard Rutmann
Max Lübbering
...
Malte Ostendorff
Samuel Weinbach
R. Sifa
Stefan Kesselheim
Nicolas Flores-Herr
21
47
0
12 Oct 2023
BC4LLM: Trusted Artificial Intelligence When Blockchain Meets Large Language Models
Haoxiang Luo
Jian Luo
Athanasios V. Vasilakos
31
9
0
10 Oct 2023
Rethinking Memory and Communication Cost for Efficient Large Language Model Training
Chan Wu
Hanxiao Zhang
Lin Ju
Jinjing Huang
Youshao Xiao
...
Siyuan Li
Fanzhuang Meng
Lei Liang
Xiaolu Zhang
Jun Zhou
23
4
0
09 Oct 2023
A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators
M. Emani
Sam Foreman
Varuni K. Sastry
Zhen Xie
Siddhisanket Raskar
William Arnold
R. Thakur
V. Vishwanath
M. Papka
ELM
24
10
0
06 Oct 2023
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Alexander Robey
Eric Wong
Hamed Hassani
George J. Pappas
AAML
38
215
0
05 Oct 2023
Previous
1
2
3
4
5
6
7
8
Next