ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.11990
  4. Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
    MoE
ArXivPDFHTML

Papers citing "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"

50 / 501 papers shown
Title
Dissecting the Runtime Performance of the Training, Fine-tuning, and
  Inference of Large Language Models
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Longteng Zhang
Xiang Liu
Zeyu Li
Xinglin Pan
Peijie Dong
...
Rui Guo
Xin Wang
Qiong Luo
S. Shi
Xiaowen Chu
41
7
0
07 Nov 2023
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via
  Discrete Diffusion
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Lunjun Zhang
Yuwen Xiong
Ze Yang
Sergio Casas
Rui Hu
R. Urtasun
39
50
0
02 Nov 2023
Learning From Mistakes Makes LLM Better Reasoner
Learning From Mistakes Makes LLM Better Reasoner
Shengnan An
Zexiong Ma
Zeqi Lin
Nanning Zheng
Jian-Guang Lou
Weizhu Chen
LRM
24
75
0
31 Oct 2023
Unlearn What You Want to Forget: Efficient Unlearning for LLMs
Unlearn What You Want to Forget: Efficient Unlearning for LLMs
Jiaao Chen
Diyi Yang
MU
27
136
0
31 Oct 2023
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and
  Scalable Large Mixture-of-Experts Models
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
28
12
0
29 Oct 2023
FP8-LM: Training FP8 Large Language Models
FP8-LM: Training FP8 Large Language Models
Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
...
Zheng-Wei Zhang
Shuguang Liu
Joe Chau
Han Hu
Peng Cheng
MQ
59
39
0
27 Oct 2023
FedPEAT: Convergence of Federated Learning, Parameter-Efficient Fine
  Tuning, and Emulator Assisted Tuning for Artificial Intelligence Foundation
  Models with Mobile Edge Computing
FedPEAT: Convergence of Federated Learning, Parameter-Efficient Fine Tuning, and Emulator Assisted Tuning for Artificial Intelligence Foundation Models with Mobile Edge Computing
Terence Jie Chua
Wen-li Yu
Junfeng Zhao
Kwok-Yan Lam
FedML
24
5
0
26 Oct 2023
Enhancing Zero-Shot Crypto Sentiment with Fine-tuned Language Model and
  Prompt Engineering
Enhancing Zero-Shot Crypto Sentiment with Fine-tuned Language Model and Prompt Engineering
Rahman S. M. Wahidur
Ishmam Tashdeed
Manjit Kaur
Heung-No Lee
ALM
25
17
0
20 Oct 2023
Primacy Effect of ChatGPT
Primacy Effect of ChatGPT
Yiwei Wang
Yujun Cai
Muhao Chen
Yuxuan Liang
Bryan Hooi
ALM
AI4MH
LRM
33
13
0
20 Oct 2023
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy
  for Language Models
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models
Jianwei Li
Qi Lei
Wei Cheng
Dongkuan Xu
KELM
19
3
0
19 Oct 2023
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for
  Text Generation
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation
Tomohito Kasahara
Daisuke Kawahara
25
2
0
17 Oct 2023
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Wenqi Jiang
Marco Zeller
R. Waleffe
Torsten Hoefler
Gustavo Alonso
47
16
0
15 Oct 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
440
0
14 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
28
0
0
11 Oct 2023
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Boxin Wang
Wei Ping
Lawrence C. McAfee
Peng-Tao Xu
Bo Li
M. Shoeybi
Bryan Catanzaro
RALM
16
46
0
11 Oct 2023
Fast-ELECTRA for Efficient Pre-training
Fast-ELECTRA for Efficient Pre-training
Chengyu Dong
Liyuan Liu
Hao Cheng
Jingbo Shang
Jianfeng Gao
Xiaodong Liu
37
2
0
11 Oct 2023
On the Impact of Cross-Domain Data on German Language Models
On the Impact of Cross-Domain Data on German Language Models
Amin Dada
Aokun Chen
C.A.I. Peng
Kaleb E. Smith
Ahmad Idrissi-Yaghir
...
Daniel Truhn
Jan Egger
Jiang Bian
Jens Kleesiek
Yonghui Wu
17
4
0
11 Oct 2023
Lemur: Harmonizing Natural Language and Code for Language Agents
Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu
Hongjin Su
Chen Xing
Boyu Mi
Qian Liu
...
Siheng Zhao
Lingpeng Kong
Bailin Wang
Caiming Xiong
Tao Yu
30
67
0
10 Oct 2023
LLM for SoC Security: A Paradigm Shift
LLM for SoC Security: A Paradigm Shift
Dipayan Saha
Shams Tarek
Katayoon Yahyaei
S. Saha
Jingbo Zhou
M. Tehranipoor
Farimah Farahmandi
56
46
0
09 Oct 2023
Self-Convinced Prompting: Few-Shot Question Answering with Repeated
  Introspection
Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection
Haodi Zhang
Min Cai
Xinhe Zhang
Chen Zhang
Rui Mao
Kaishun Wu
KELM
LRM
ReLM
30
8
0
08 Oct 2023
SmartPlay: A Benchmark for LLMs as Intelligent Agents
SmartPlay: A Benchmark for LLMs as Intelligent Agents
Yue Wu
Xuan Tang
Tom Michael Mitchell
Yuanzhi Li
ELM
LLMAG
27
63
0
02 Oct 2023
Modularity in Deep Learning: A Survey
Modularity in Deep Learning: A Survey
Haozhe Sun
Isabelle Guyon
MoMe
30
2
0
02 Oct 2023
Self-Supervised Open-Ended Classification with Small Visual Language
  Models
Self-Supervised Open-Ended Classification with Small Visual Language Models
Mohammad Mahdi Derakhshani
Ivona Najdenkoska
Cees G. M. Snoek
M. Worring
Yuki M. Asano
VLM
22
0
0
30 Sep 2023
GAIA-1: A Generative World Model for Autonomous Driving
GAIA-1: A Generative World Model for Autonomous Driving
Masane Fuchi
Lloyd Russell
Hudson Yeo
Zak Murez
Hiroto Minami
Alex Kendall
Tomohiro Takagi
Gianluca Corrado
VGen
28
215
0
29 Sep 2023
Graph Neural Prompting with Large Language Models
Graph Neural Prompting with Large Language Models
Yijun Tian
Huan Song
Zichen Wang
Haozhu Wang
Ziqing Hu
Fang Wang
Nitesh V. Chawla
Panpan Xu
AI4CE
35
44
0
27 Sep 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
  Long Sequence Transformer Models
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
25
100
0
25 Sep 2023
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language
  Models
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
Ahmad Faiz
S. Kaneda
Ruhan Wang
Rita Osi
Parteek Sharma
Fan Chen
Lei Jiang
28
56
0
25 Sep 2023
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model
  Pre-trained from Scratch
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch
Juntao Li
Zecheng Tang
Yuyang Ding
Pinzheng Wang
Pei Guo
...
Wenliang Chen
Guohong Fu
Qiaoming Zhu
Guodong Zhou
M. Zhang
40
5
0
19 Sep 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative
  Model Inference with Unstructured Sparsity
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
S. Song
57
11
0
19 Sep 2023
Towards Ontology Construction with Language Models
Towards Ontology Construction with Language Models
Maurice Funk
Simon Hosemann
J. C. Jung
Carsten Lutz
LRM
23
32
0
18 Sep 2023
Oobleck: Resilient Distributed Training of Large Models Using Pipeline
  Templates
Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates
Insu Jang
Zhenning Yang
Zhen Zhang
Xin Jin
Mosharaf Chowdhury
MoE
AI4CE
OODD
10
43
0
15 Sep 2023
Understanding the Impact of Post-Training Quantization on Large Language
  Models
Understanding the Impact of Post-Training Quantization on Large Language Models
Somnath Roy
MQ
30
3
0
11 Sep 2023
GPT Can Solve Mathematical Problems Without a Calculator
GPT Can Solve Mathematical Problems Without a Calculator
Z. Yang
Ming Ding
Qingsong Lv
Zhihuan Jiang
Zehai He
Yuyi Guo
Jinfeng Bai
Jie Tang
RALM
LRM
31
52
0
06 Sep 2023
Advances in machine-learning-based sampling motivated by lattice quantum
  chromodynamics
Advances in machine-learning-based sampling motivated by lattice quantum chromodynamics
Kyle Cranmer
G. Kanwar
S. Racanière
Danilo Jimenez Rezende
P. Shanahan
AI4CE
26
27
0
03 Sep 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
  Large Model
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Fengxiang Bie
Yibo Yang
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
S. Song
EGVM
25
18
0
02 Sep 2023
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large
  Model Training Efficiency
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency
Ziming Liu
Shenggan Cheng
Hao Zhou
Yang You
15
34
0
30 Aug 2023
Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge
  Selection
Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection
Hongjin Qian
Zhicheng Dou
Jiejun Tan
Haonan Chen
Haoqi Gu
Ruofei Lai
Xinyu Zhang
Zhao Cao
Ji-Rong Wen
27
2
0
30 Aug 2023
The Promise and Peril of Artificial Intelligence -- Violet Teaming
  Offers a Balanced Path Forward
The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward
A. Titus
Adam Russell
33
1
0
28 Aug 2023
D4: Improving LLM Pretraining via Document De-Duplication and
  Diversification
D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Kushal Tirumala
Daniel Simig
Armen Aghajanyan
Ari S. Morcos
SyDa
13
104
0
23 Aug 2023
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable
  Mixture-of-Expert Inference
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Ranggi Hwang
Jianyu Wei
Shijie Cao
Changho Hwang
Xiaohu Tang
Ting Cao
Mao Yang
MoE
45
40
0
23 Aug 2023
Exploring the Effectiveness of GPT Models in Test-Taking: A Case Study
  of the Driver's License Knowledge Test
Exploring the Effectiveness of GPT Models in Test-Taking: A Case Study of the Driver's License Knowledge Test
Saba Rahimi
T. Balch
Manuela Veloso
ELM
21
1
0
22 Aug 2023
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
Yeqi Gao
Zhao-quan Song
Junze Yin
21
18
0
21 Aug 2023
A Survey on Fairness in Large Language Models
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
49
59
0
20 Aug 2023
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code
  Generation
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation
Dong Huang
Qi Bu
Yuhao Qing
Heming Cui
LRM
24
15
0
17 Aug 2023
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder
  Language Models
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
Jie Huang
Wei Ping
Peng-Tao Xu
M. Shoeybi
Kevin Chen-Chuan Chang
Bryan Catanzaro
RALM
32
33
0
15 Aug 2023
EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task
  Tasks for E-commerce
EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce
Y. Li
Shirong Ma
Xiaobin Wang
Shen Huang
Chengyue Jiang
Haitao Zheng
Pengjun Xie
Fei Huang
Yong-jia Jiang
RALM
ALM
LRM
32
49
0
14 Aug 2023
Multimodal Pretrained Models for Verifiable Sequential Decision-Making:
  Planning, Grounding, and Perception
Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception
Yunhao Yang
Cyrus Neary
Ufuk Topcu
LM&Ro
OffRL
25
5
0
10 Aug 2023
Cumulative Reasoning with Large Language Models
Cumulative Reasoning with Large Language Models
Yifan Zhang
Jingqin Yang
Yang Yuan
Andrew Chi-Chih Yao
ReLM
ELM
LRM
AI4CE
36
67
0
08 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
25
3
0
07 Aug 2023
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models
  Fine-tuning
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Longteng Zhang
Lin Zhang
S. Shi
X. Chu
Bo-wen Li
AI4CE
13
91
0
07 Aug 2023
Previous
12345...91011
Next