Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.11990
Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"
50 / 501 papers shown
Title
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
26
185
0
22 May 2022
Clinical Prompt Learning with Frozen Language Models
Niall Taylor
Yi Zhang
Dan W Joyce
A. Nevado-Holgado
Andrey Kormilitzin
VLM
LM&MA
16
31
0
11 May 2022
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
27
256
0
10 May 2022
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary
A. Kolesnikova
Yuri Kuratov
Vasily Konovalov
Mikhail Burtsev
VLM
21
10
0
04 May 2022
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
54
3,486
0
02 May 2022
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning
Ehud D. Karpas
Omri Abend
Yonatan Belinkov
Barak Lenz
Opher Lieber
...
Erez Schwartz
Gal Shachaf
Shai Shalev-Shwartz
Amnon Shashua
Moshe Tenenholtz
LLMAG
12
68
0
01 May 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul M. Chilimbi
Mu Li
Xin Jin
11
39
0
30 Apr 2022
Inferring Implicit Relations in Complex Questions with Language Models
Uri Katz
Mor Geva
Jonathan Berant
ReLM
LRM
22
11
0
28 Apr 2022
Autoregressive Search Engines: Generating Substrings as Document Identifiers
Michele Bevilacqua
G. Ottaviano
Patrick Lewis
Wen-tau Yih
Sebastian Riedel
Fabio Petroni
KELM
RALM
30
155
0
22 Apr 2022
Improving Passage Retrieval with Zero-Shot Question Generation
Devendra Singh Sachan
M. Lewis
Mandar Joshi
Armen Aghajanyan
Wen-tau Yih
J. Pineau
Luke Zettlemoyer
OOD
LRM
21
155
0
15 Apr 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
63
800
0
14 Apr 2022
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Payal Bajaj
Chenyan Xiong
Guolin Ke
Xiaodong Liu
Di He
Saurabh Tiwary
Tie-Yan Liu
Paul N. Bennett
Xia Song
Jianfeng Gao
42
32
0
13 Apr 2022
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
26
168
0
12 Apr 2022
Considerations for Multilingual Wikipedia Research
Isaac Johnson
Emily A. Lescak
10
3
0
05 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
83
6,004
0
05 Apr 2022
Model-Parallel Fourier Neural Operators as Learned Surrogates for Large-Scale Parametric PDEs
Thomas J. Grady
R. Khan
M. Louboutin
Ziyi Yin
Philipp A. Witte
Ranveer Chandra
Russell J. Hewett
Felix J. Herrmann
AI4CE
18
33
0
04 Apr 2022
Scaling Up Models and Data with
t5x
\texttt{t5x}
t5x
and
seqio
\texttt{seqio}
seqio
Adam Roberts
Hyung Won Chung
Anselm Levskaya
Gaurav Mishra
James Bradbury
...
Brennan Saeta
Ryan Sepassi
A. Spiridonov
Joshua Newlan
Andrea Gesmundo
ALM
43
193
0
31 Mar 2022
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model
Fei Mi
Yitong Li
Yulong Zeng
Jingyan Zhou
Yasheng Wang
Chuanfei Xu
Lifeng Shang
Xin Jiang
Shiqi Zhao
Qun Liu
ALM
37
18
0
31 Mar 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
37
1,830
0
29 Mar 2022
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Rogerio Bonatti
LM&Ro
26
49
0
25 Mar 2022
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
237
257
0
21 Mar 2022
A Survey of Multi-Tenant Deep Learning Inference on GPU
Fuxun Yu
Di Wang
Longfei Shangguan
Minjia Zhang
Chenchen Liu
Xiang Chen
BDL
AI4CE
11
32
0
17 Mar 2022
Multi-Stage Prompting for Knowledgeable Dialogue Generation
Zihan Liu
M. Patwary
R. Prenger
Shrimai Prabhumoye
Wei Ping
M. Shoeybi
Bryan Catanzaro
24
49
0
16 Mar 2022
Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again
Bernal Jiménez Gutiérrez
Nikolas McNeal
Clay Washington
You Chen
Lang Li
Huan Sun
Yu-Chuan Su
20
150
0
16 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLM
MQ
MedIm
20
119
0
14 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
156
0
01 Mar 2022
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks
Jingyan Zhou
Deng Jiawen
Fei Mi
Yitong Li
Yasheng Wang
Minlie Huang
Xin Jiang
Qun Liu
H. Meng
25
31
0
16 Feb 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
22
20
0
12 Feb 2022
Compute Trends Across Three Eras of Machine Learning
J. Sevilla
Lennart Heim
A. Ho
T. Besiroglu
Marius Hobbhahn
Pablo Villalobos
20
269
0
11 Feb 2022
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Boxin Wang
Wei Ping
Chaowei Xiao
P. Xu
M. Patwary
M. Shoeybi
Bo-wen Li
Anima Anandkumar
Bryan Catanzaro
14
64
0
08 Feb 2022
Accelerated Quality-Diversity through Massive Parallelism
Bryan Lim
Maxime Allard
Luca Grillotti
Antoine Cully
17
16
0
02 Feb 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
32
283
0
14 Jan 2022
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
54
188
0
20 Dec 2021
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
Shrimai Prabhumoye
Rafal Kocielnik
M. Shoeybi
Anima Anandkumar
Bryan Catanzaro
30
20
0
15 Dec 2021
Discourse-Aware Soft Prompting for Text Generation
Marjan Ghazvininejad
Vladimir Karpukhin
Vera Gor
Asli Celikyilmaz
23
6
0
10 Dec 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
49
1,746
0
18 Nov 2021
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation
J. L. Berral
Oriol Aranda
J. L. Domínguez
Jordi Torres
OOD
13
3
0
29 Oct 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,656
0
15 Oct 2021
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
Junyang Lin
An Yang
Jinze Bai
Chang Zhou
Le Jiang
...
Jie M. Zhang
Yong Li
Wei Lin
Jingren Zhou
Hongxia Yang
MoE
92
43
0
08 Oct 2021
PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation
Siqi Bao
H. He
Fan Wang
Hua-Hong Wu
Haifeng Wang
...
Xinxian Huang
Xin Tian
Xinchao Xu
Yingzhan Lin
Zhengyu Niu
VLM
ALM
24
60
0
20 Sep 2021
Towards Zero-Label Language Learning
Zirui Wang
Adams Wei Yu
Orhan Firat
Yuan Cao
SyDa
182
102
0
19 Sep 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
247
193
0
15 Sep 2021
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models
Conglong Li
Minjia Zhang
Yuxiong He
15
37
0
13 Aug 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,844
0
18 Apr 2021
Creativity and Machine Learning: A Survey
Giorgio Franceschelli
Mirco Musolesi
VLM
AI4CE
29
40
0
06 Apr 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
259
374
0
28 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
253
1,989
0
31 Dec 2020
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks
Ileana Rugina
Rumen Dangovski
L. Jing
Preslav Nakov
Marin Soljacic
18
0
0
20 Nov 2020
Improving Readability for Automatic Speech Recognition Transcription
Junwei Liao
Sefik Emre Eskimez
Liyang Lu
Yu Shi
Ming Gong
Linjun Shou
Hong Qu
Michael Zeng
27
55
0
09 Apr 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
Previous
1
2
3
...
10
11
9
Next