ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.04473
  4. Cited By
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

9 April 2021
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
V. Korthikanti
Dmitri Vainbrand
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
    MoE
ArXivPDFHTML

Papers citing "Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM"

50 / 366 papers shown
Title
From Words to Watts: Benchmarking the Energy Costs of Large Language
  Model Inference
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference
S. Samsi
Dan Zhao
Joseph McDonald
Baolin Li
Adam Michaleas
Michael Jones
William Bergeron
J. Kepner
Devesh Tiwari
V. Gadepally
15
120
0
04 Oct 2023
Effective Long-Context Scaling of Foundation Models
Effective Long-Context Scaling of Foundation Models
Wenhan Xiong
Jingyu Liu
Igor Molybog
Hejia Zhang
Prajjwal Bhargava
...
Dániel Baráth
Sergey Edunov
Mike Lewis
Sinong Wang
Hao Ma
29
206
0
27 Sep 2023
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language
  Models
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
Ahmad Faiz
S. Kaneda
Ruhan Wang
Rita Osi
Parteek Sharma
Fan Chen
Lei Jiang
31
56
0
25 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
66
703
0
19 Sep 2023
A Duty to Forget, a Right to be Assured? Exposing Vulnerabilities in
  Machine Unlearning Services
A Duty to Forget, a Right to be Assured? Exposing Vulnerabilities in Machine Unlearning Services
Hongsheng Hu
Shuo Wang
Jiamin Chang
Haonan Zhong
Ruoxi Sun
Shuang Hao
Haojin Zhu
Minhui Xue
MU
21
26
0
15 Sep 2023
Oobleck: Resilient Distributed Training of Large Models Using Pipeline
  Templates
Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates
Insu Jang
Zhenning Yang
Zhen Zhang
Xin Jin
Mosharaf Chowdhury
MoE
AI4CE
OODD
20
44
0
15 Sep 2023
DeViT: Decomposing Vision Transformers for Collaborative Inference in
  Edge Devices
DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Guanyu Xu
Zhiwei Hao
Yong Luo
Han Hu
J. An
Shiwen Mao
ViT
37
14
0
10 Sep 2023
Beyond Traditional Teaching: The Potential of Large Language Models and
  Chatbots in Graduate Engineering Education
Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education
M. Abedi
Ibrahem Alshybani
M. Shahadat
M. Murillo
43
13
0
09 Sep 2023
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Kabir Nagrecha
Arun Kumar
16
6
0
03 Sep 2023
FusionAI: Decentralized Training and Deploying LLMs with Massive
  Consumer-Level GPUs
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Zhenheng Tang
Yuxin Wang
Xin He
Longteng Zhang
Xinglin Pan
...
Rongfei Zeng
Kaiyong Zhao
S. Shi
Bingsheng He
Xiaowen Chu
38
30
0
03 Sep 2023
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large
  Model Training Efficiency
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency
Ziming Liu
Shenggan Cheng
Hao Zhou
Yang You
20
34
0
30 Aug 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on
  Language, Multimodal, and Scientific GPT Models
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
30
4
0
27 Aug 2023
PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on
  HPC Systems
PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems
Runzhou Han
Mai Zheng
S. Byna
Houjun Tang
Bin Dong
...
Yong Chen
Dongkyun Kim
Joseph Hassoun
D. Thorsley
Matthew Wolf
26
2
0
02 Aug 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
36
1
0
31 Jul 2023
GridMM: Grid Memory Map for Vision-and-Language Navigation
GridMM: Grid Memory Map for Vision-and-Language Navigation
Zihan Wang
Xiangyang Li
Jiahao Yang
Yeqi Liu
Shuqiang Jiang
33
52
0
24 Jul 2023
Tackling the Curse of Dimensionality with Physics-Informed Neural
  Networks
Tackling the Curse of Dimensionality with Physics-Informed Neural Networks
Zheyuan Hu
K. Shukla
George Karniadakis
Kenji Kawaguchi
PINN
AI4CE
65
85
0
23 Jul 2023
Applying QNLP to sentiment analysis in finance
Applying QNLP to sentiment analysis in finance
Jonas Stein
Ivo Christ
Nico Kraus
M. Mansky
Robert Muller
Claudia Linnhoff-Popien
28
20
0
20 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey
S. Mohamadi
G. Mujtaba
Ngan Le
Gianfranco Doretto
Don Adjeroh
LM&MA
AI4MH
23
21
0
09 Jul 2023
Improving Automatic Parallel Training via Balanced Memory Workload
  Optimization
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Bin Cui
45
9
0
05 Jul 2023
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Hikaru Shindo
Viktor Pfanschilling
Devendra Singh Dhami
Kristian Kersting
NAI
32
6
0
03 Jul 2023
Computron: Serving Distributed Deep Learning Models with Model Parallel
  Swapping
Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping
Daniel Zou
X. Jin
Xueyang Yu
Haotian Zhang
J. Demmel
MoE
29
0
0
24 Jun 2023
Deep Fusion: Efficient Network Training via Pre-trained Initializations
Deep Fusion: Efficient Network Training via Pre-trained Initializations
Hanna Mazzawi
X. Gonzalvo
Michael Wunder
Sammy Jerome
Benoit Dherin
AI4CE
36
3
0
20 Jun 2023
DropCompute: simple and more robust distributed synchronous training via
  compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
33
2
0
18 Jun 2023
ZeRO++: Extremely Efficient Collective Communication for Giant Model
  Training
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Guanhua Wang
Heyang Qin
S. A. Jacobs
Connor Holmes
Samyam Rajbhandari
Olatunji Ruwase
Feng Yan
Lei Yang
Yuxiong He
VLM
59
57
0
16 Jun 2023
Full Parameter Fine-tuning for Large Language Models with Limited
  Resources
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
45
126
0
16 Jun 2023
DistSim: A performance model of large-scale hybrid distributed DNN
  training
DistSim: A performance model of large-scale hybrid distributed DNN training
Guandong Lu
Run Chen
Yakai Wang
Yangjie Zhou
Rui Zhang
...
Yanming Miao
Zhifang Cai
Li-Wei Li
Jingwen Leng
Minyi Guo
30
11
0
14 Jun 2023
On the Role of Attention in Prompt-tuning
On the Role of Attention in Prompt-tuning
Samet Oymak
A. S. Rawat
Mahdi Soltanolkotabi
Christos Thrampoulidis
MLT
LRM
20
41
0
06 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically
  Equivalent
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
34
15
0
05 Jun 2023
Proteus: Simulating the Performance of Distributed DNN Training
Proteus: Simulating the Performance of Distributed DNN Training
Jiangfei Duan
Xiuhong Li
Ping Xu
Xingcheng Zhang
Shengen Yan
Yun Liang
Dahua Lin
79
10
0
04 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
58
3,354
0
29 May 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
32
200
0
25 May 2023
Automated Tensor Model Parallelism with Overlapped Communication for
  Efficient Foundation Model Training
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training
Shengwei Li
Zhiquan Lai
Yanqi Hao
Weijie Liu
Ke-shi Ge
Xiaoge Deng
Dongsheng Li
KaiCheng Lu
11
10
0
25 May 2023
Towards A Unified View of Sparse Feed-Forward Network in Pretraining
  Large Language Model
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model
Leo Liu
Tim Dettmers
Xi Lin
Ves Stoyanov
Xian Li
MoE
20
9
0
23 May 2023
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ariel Ekgren
Amaru Cuba Gyllensten
Felix Stollenwerk
Joey Öhman
T. Isbister
Evangelia Gogoulou
F. Carlsson
Alice Heiman
Judit Casademont
Magnus Sahlgren
27
13
0
22 May 2023
Fast Distributed Inference Serving for Large Language Models
Fast Distributed Inference Serving for Large Language Models
Bingyang Wu
Yinmin Zhong
Zili Zhang
Gang Huang
Xuanzhe Liu
Xin Jin
30
92
0
10 May 2023
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
  Transformer APIs
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs
Deepak Narayanan
Keshav Santhanam
Peter Henderson
Rishi Bommasani
Tony Lee
Percy Liang
145
3
0
03 May 2023
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
Xin Chen
Hengheng Zhang
Xiaotao Gu
Kaifeng Bi
Lingxi Xie
Qi Tian
MoE
19
4
0
22 Apr 2023
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Yanli Zhao
Andrew Gu
R. Varma
Liangchen Luo
Chien-chin Huang
...
Bernard Nguyen
Geeta Chauhan
Y. Hao
Ajit Mathews
Shen Li
FedML
MoE
32
306
0
21 Apr 2023
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss
  Prediction across Scales
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Yiqun Yao
Siqi Fan
Xiusheng Huang
Xuezhi Fang
Xiang Li
...
Peng Han
Shuo Shang
Kang Liu
Aixin Sun
Yequan Wang
27
6
0
14 Apr 2023
ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review
ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review
Sunder Ali Khowaja
P. Khuwaja
K. Dev
Weizheng Wang
Lewis Nkenyereye
29
76
0
13 Apr 2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via
  Dynamic Device Placement
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Xiaonan Nie
Xupeng Miao
Zilong Wang
Zichao Yang
Jilong Xue
Lingxiao Ma
Gang-Ming Cao
Bin Cui
MoE
39
44
0
08 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems
  for Large-model Training at Scale
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
William Won
Taekyung Heo
Saeed Rashidi
Srinivas Sridharan
Sudarshan Srinivasan
T. Krishna
36
43
0
24 Mar 2023
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4
Zheng-Long Liu
Yue Huang
Xiao-Xing Yu
Lu Zhang
Zihao Wu
...
Dinggang Shen
Quanzheng Li
Tianming Liu
Dajiang Zhu
Xiang Li
LM&MA
MedIm
33
171
0
20 Mar 2023
FlexGen: High-Throughput Generative Inference of Large Language Models
  with a Single GPU
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
OCCL: a Deadlock-free Library for GPU Collective Communication
OCCL: a Deadlock-free Library for GPU Collective Communication
Lichen Pan
Juncheng Liu
Jinhui Yuan
Rongkai Zhang
Pengze Li
Zhen Xiao
27
1
0
11 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
25
42
0
10 Mar 2023
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in
  Tencent
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Xiaonan Nie
Yi Liu
Fangcheng Fu
Jinbao Xue
Dian Jiao
Xupeng Miao
Yangyu Tao
Bin Cui
MoE
31
16
0
06 Mar 2023
Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by
  Adaptive Group-Scheduling for Micro-Batches
Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches
Siyu Wang
Zongyan Cao
Chang Si
Lansong Diao
Jiamang Wang
W. Lin
29
0
0
03 Mar 2023
A Pathway Towards Responsible AI Generated Content
A Pathway Towards Responsible AI Generated Content
Chen Chen
Jie Fu
Lingjuan Lyu
49
71
0
02 Mar 2023
Previous
12345678
Next