ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.11990
  4. Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
    MoE
ArXivPDFHTML

Papers citing "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"

50 / 501 papers shown
Title
Fairness Definitions in Language Models Explained
Fairness Definitions in Language Models Explained
Thang Viet Doan
Zhibo Chu
Zichong Wang
Wenbin Zhang
ALM
55
10
0
26 Jul 2024
ALLaM: Large Language Models for Arabic and English
ALLaM: Large Language Models for Arabic and English
M Saiful Bari
Yazeed Alnumay
Norah A. Alzahrani
Nouf M. Alotaibi
H. A. Alyahya
...
Jeril Kuriakose
Abdalghani Abujabal
Nora Al-Twairesh
Areeb Alowisheq
Haidar Khan
34
11
0
22 Jul 2024
Recent Advances in Generative AI and Large Language Models: Current
  Status, Challenges, and Perspectives
Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives
D. Hagos
Rick Battle
Danda B. Rawat
LM&MA
OffRL
25
22
0
20 Jul 2024
On Pre-training of Multimodal Language Models Customized for Chart
  Understanding
On Pre-training of Multimodal Language Models Customized for Chart Understanding
Wan-Cyuan Fan
Yen-Chun Chen
Mengchen Liu
Lu Yuan
Leonid Sigal
42
5
0
19 Jul 2024
Enhancing Split Computing and Early Exit Applications through Predefined
  Sparsity
Enhancing Split Computing and Early Exit Applications through Predefined Sparsity
Luigi Capogrosso
Enrico Fraccaroli
Giulio Petrozziello
Francesco Setti
Samarjit Chakraborty
Franco Fummi
Marco Cristani
26
3
0
16 Jul 2024
Investigating Low-Rank Training in Transformer Language Models:
  Efficiency and Scaling Analysis
Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis
Xiuying Wei
Skander Moalla
Razvan Pascanu
Çağlar Gülçehre
33
1
0
13 Jul 2024
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models
  with Adaptive Expert Placement
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Yongji Wu
Wenjie Qu
Tianyang Tao
Zhuang Wang
Wei Bai
Zhuohao Li
Yuan Tian
Jiaheng Zhang
Matthew Lentz
Danyang Zhuo
61
3
0
05 Jul 2024
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
John Mendonça
A. Lavie
Isabel Trancoso
ELM
43
2
0
04 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
52
20
0
27 Jun 2024
Fairness and Bias in Multimodal AI: A Survey
Fairness and Bias in Multimodal AI: A Survey
Tosin P. Adewumi
Lama Alkhaled
Namrata Gurung
G. V. Boven
Irene Pagliai
55
9
0
27 Jun 2024
Universal Checkpointing: Efficient and Flexible Checkpointing for Large
  Scale Distributed Training
Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Xinyu Lian
Sam Ade Jacobs
Lev Kurilenko
Masahiro Tanaka
Stas Bekman
Olatunji Ruwase
Minjia Zhang
OffRL
18
8
0
27 Jun 2024
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical
  and Chemistry
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry
Linqing Chen
Weilei Wang
Zilong Bai
Peng Xu
Yan Fang
...
Lisha Zhang
Fu Bian
Zhongkai Ye
Lidong Pei
Changyang Tu
AI4MH
LM&MA
50
2
0
26 Jun 2024
Building on Efficient Foundations: Effectively Training LLMs with
  Structured Feedforward Layers
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
Xiuying Wei
Skander Moalla
Razvan Pascanu
Çağlar Gülçehre
22
0
0
24 Jun 2024
FastPersist: Accelerating Model Checkpointing in Deep Learning
FastPersist: Accelerating Model Checkpointing in Deep Learning
Guanhua Wang
Olatunji Ruwase
Bing Xie
Yuxiong He
27
7
0
19 Jun 2024
Improving Large Models with Small models: Lower Costs and Better
  Performance
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
37
4
0
15 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
39
3
0
13 Jun 2024
Resource Allocation and Workload Scheduling for Large-Scale Distributed
  Deep Learning: A Survey
Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Chengming Li
Victor C. M. Leung
Yanyi Guo
Xiping Hu
38
3
0
12 Jun 2024
OLMES: A Standard for Language Model Evaluations
OLMES: A Standard for Language Model Evaluations
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
ELM
40
14
0
12 Jun 2024
Image Textualization: An Automatic Framework for Creating Accurate and
  Detailed Image Descriptions
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
Renjie Pi
Jianshu Zhang
Jipeng Zhang
Rui Pan
Zhekai Chen
Tong Zhang
3DV
47
19
0
11 Jun 2024
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel
  Fusion
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Li-Wen Chang
Wenlei Bao
Qi Hou
Chengquan Jiang
Ningxin Zheng
...
Zuquan Song
Ziheng Jiang
Haibin Lin
Xin Jin
Xin Liu
36
19
0
11 Jun 2024
Scaling Large Language Model-based Multi-Agent Collaboration
Scaling Large Language Model-based Multi-Agent Collaboration
Chen Qian
Zihao Xie
YiFei Wang
Wei Liu
Yufan Dang
...
Zhuoyun Du
Weize Chen
Cheng Yang
Zhiyuan Liu
Maosong Sun
AI4CE
LLMAG
LM&Ro
58
45
0
11 Jun 2024
Investigating and Addressing Hallucinations of LLMs in Tasks Involving
  Negation
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
Neeraj Varshney
Satyam Raj
Venkatesh Mishra
Agneet Chatterjee
Ritika Sarkar
Amir Saeidi
Chitta Baral
LRM
35
7
0
08 Jun 2024
PETRA: Parallel End-to-end Training with Reversible Architectures
PETRA: Parallel End-to-end Training with Reversible Architectures
Stephane Rivaud
Louis Fournier
Thomas Pumir
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
21
0
0
04 Jun 2024
Sparsity-Accelerated Training for Large Language Models
Sparsity-Accelerated Training for Large Language Models
Da Ma
Lu Chen
Pengyu Wang
Hongshen Xu
Hanqi Li
Liangtai Sun
Su Zhu
Shuai Fan
Kai Yu
LRM
33
0
0
03 Jun 2024
ACCO: Accumulate while you Communicate, Hiding Communications in
  Distributed LLM Training
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli
Louis Fournier
Pierre Erbacher
Louis Serrano
Eugene Belilovsky
Edouard Oyallon
FedML
46
1
0
03 Jun 2024
Occam Gradient Descent
Occam Gradient Descent
B. N. Kausik
ODL
VLM
32
0
0
30 May 2024
Smart Bilingual Focused Crawling of Parallel Documents
Smart Bilingual Focused Crawling of Parallel Documents
Cristian García-Romero
Miquel Espla-Gomis
Felipe Sánchez-Martínez
24
0
0
23 May 2024
A Survey on Transformers in NLP with Focus on Efficiency
A Survey on Transformers in NLP with Focus on Efficiency
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
40
2
0
15 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
31
6
0
14 May 2024
Risks and Opportunities of Open-Source Generative AI
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
40
18
0
14 May 2024
ChuXin: 1.6B Technical Report
ChuXin: 1.6B Technical Report
Xiaomin Zhuang
Yufan Jiang
Qiaozhi He
Zhihua Wu
ALM
41
0
0
08 May 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksandar Petrov
Bertie Vidgen
Christian Schroeder de Witt
Fabio Pizzati
...
Paul Röttger
Philip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
46
6
0
25 Apr 2024
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
Chengwei Qin
Wenhan Xia
Tan Wang
Fangkai Jiao
Yuchen Hu
Bosheng Ding
Ruirui Chen
Shafiq R. Joty
LRM
37
3
0
19 Apr 2024
Towards Universal Performance Modeling for Machine Learning Training on
  Multi-GPU Platforms
Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
Zhongyi Lin
Ning Sun
Pallab Bhattacharya
Xizhou Feng
Louis Feng
John Douglas Owens
32
1
0
19 Apr 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A
  Comprehensive Survey
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
34
6
0
09 Apr 2024
How much reliable is ChatGPT's prediction on Information Extraction
  under Input Perturbations?
How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?
Ishani Mondal
Abhilasha Sancheti
17
1
0
07 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
53
7
0
07 Apr 2024
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
37
24
0
06 Apr 2024
Toward Inference-optimal Mixture-of-Expert Large Language Models
Toward Inference-optimal Mixture-of-Expert Large Language Models
Longfei Yun
Yonghao Zhuang
Yao Fu
Eric P. Xing
Hao Zhang
MoE
68
6
0
03 Apr 2024
Opportunities and challenges in the application of large artificial
  intelligence models in radiology
Opportunities and challenges in the application of large artificial intelligence models in radiology
Liangrui Pan
Zhenyu Zhao
Ying Lu
Kewei Tang
Liyong Fu
Qingchun Liang
Shaoliang Peng
LM&MA
MedIm
AI4CE
39
5
0
24 Mar 2024
SEVEN: Pruning Transformer Model by Reserving Sentinels
SEVEN: Pruning Transformer Model by Reserving Sentinels
Jinying Xiao
Ping Li
Jie Nie
Zhe Tang
31
3
0
19 Mar 2024
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Hakim Sidahmed
Samrat Phatale
Alex Hutcheson
Zhuonan Lin
Zhan Chen
...
Jessica Hoffmann
Hassan Mansoor
Wei Li
Abhinav Rastogi
Lucas Dixon
30
1
0
15 Mar 2024
Strengthening Multimodal Large Language Model with Bootstrapped
  Preference Optimization
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Renjie Pi
Tianyang Han
Wei Xiong
Jipeng Zhang
Runtao Liu
Rui Pan
Tong Zhang
MLLM
37
33
0
13 Mar 2024
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural
  Networks
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks
Louis Fournier
Edouard Oyallon
33
0
0
13 Mar 2024
AutoDev: Automated AI-Driven Development
AutoDev: Automated AI-Driven Development
Michele Tufano
Anisha Agarwal
Jinu Jang
Roshanak Zilouchian Moghaddam
Neel Sundaresan
36
15
0
13 Mar 2024
Model Parallelism on Distributed Infrastructure: A Literature Review
  from Theory to LLM Case-Studies
Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies
Felix Brakel
Uraz Odyurt
A. Varbanescu
GNN
31
11
0
06 Mar 2024
Should We Fear Large Language Models? A Structural Analysis of the Human
  Reasoning System for Elucidating LLM Capabilities and Risks Through the Lens
  of Heidegger's Philosophy
Should We Fear Large Language Models? A Structural Analysis of the Human Reasoning System for Elucidating LLM Capabilities and Risks Through the Lens of Heidegger's Philosophy
Jianqiiu Zhang
ELM
35
1
0
05 Mar 2024
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Hanlin Tang
Yifu Sun
Decheng Wu
Kai Liu
Jianchen Zhu
Zhanhui Kang
MQ
28
10
0
05 Mar 2024
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts
  for Medical Open-Domain Question Answering
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering
Giacomo Frisoni
Alessio Cocchieri
Alex Presepi
Gianluca Moro
Zaiqiao Meng
RALM
MedIm
44
15
0
04 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
46
44
0
04 Mar 2024
Previous
12345...91011
Next