Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.11990
Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"
50 / 501 papers shown
Title
MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers
Mohammadmahdi Nouriborji
Omid Rohanian
Samaneh Kouchaki
David A. Clifton
24
8
0
12 Oct 2022
Context Generation Improves Open Domain Question Answering
Dan Su
M. Patwary
Shrimai Prabhumoye
Peng-Tao Xu
R. Prenger
M. Shoeybi
Pascale Fung
Anima Anandkumar
Bryan Catanzaro
LLMAG
LRM
17
6
0
12 Oct 2022
Decoupled Context Processing for Context Augmented Language Modeling
Zonglin Li
Ruiqi Guo
Surinder Kumar
RALM
KELM
19
23
0
11 Oct 2022
Continual Training of Language Models for Few-Shot Learning
Zixuan Ke
Haowei Lin
Yijia Shao
Hu Xu
Lei Shu
Bin Liu
KELM
BDL
CLL
87
34
0
11 Oct 2022
Controllable Dialogue Simulation with In-Context Learning
Zekun Li
Wenhu Chen
Shiyang Li
Hong Wang
Jingu Qian
Xi Yan
136
44
0
09 Oct 2022
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
S. Kwon
Jeonghoon Kim
Jeongin Bae
Kang Min Yoo
Jin-Hwa Kim
Baeseong Park
Byeongwook Kim
Jung-Woo Ha
Nako Sung
Dongsoo Lee
MQ
26
30
0
08 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
114
93
0
06 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
250
1,073
0
05 Oct 2022
Social and environmental impact of recent developments in machine learning on biology and chemistry research
Daniel Probst
22
1
0
01 Oct 2022
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks
Zhenhailong Wang
Xiaoman Pan
Dian Yu
Dong Yu
Jianshu Chen
Heng Ji
VLM
38
9
0
01 Oct 2022
Improving Robustness with Adaptive Weight Decay
Amin Ghiasi
Ali Shafahi
R. Ardekani
OOD
17
7
0
30 Sep 2022
PROD: Progressive Distillation for Dense Retrieval
Zhenghao Lin
Yeyun Gong
Xiao Liu
Hang Zhang
Chen Lin
...
Jian Jiao
Jing Lu
Daxin Jiang
Rangan Majumder
Nan Duan
48
27
0
27 Sep 2022
SpeedLimit: Neural Architecture Search for Quantized Transformer Models
Yuji Chai
Luke Bailey
Yunho Jin
Matthew Karle
Glenn G. Ko
David Brooks
Gu-Yeon Wei
H. T. Kung
MQ
12
0
0
25 Sep 2022
Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network
Mario Krenn
L. Buffoni
B. Coutinho
S. Eppel
J. Foster
...
Ngoc M. Tran
Francisco Valente
Yangxinyu Xie
Rose Yu
Michael K Kopp
30
41
0
23 Sep 2022
Variational Open-Domain Question Answering
Valentin Liévin
Andreas Geert Motzfeldt
Ida Riis Jensen
Ole Winther
OOD
BDL
36
8
0
23 Sep 2022
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training
Rogerio Bonatti
Sai H. Vemprala
Shuang Ma
Felipe Vieira Frujeri
Shuhang Chen
Ashish Kapoor
33
22
0
22 Sep 2022
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
13
2
0
22 Sep 2022
Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
Xuesu Xiao
Tingnan Zhang
K. Choromanski
Edward J. Lee
Anthony G. Francis
...
Leila Takayama
Roy Frostig
Jie Tan
Carolina Parada
Vikas Sindhwani
72
54
0
22 Sep 2022
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
M. Shoeybi
Michael Siu
Hao Wu
BDL
VLM
MQ
69
122
0
12 Sep 2022
EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Jiangsu Du
Ziming Liu
Jiarui Fang
Shenggui Li
Yongbin Li
Yutong Lu
Yang You
MoE
27
4
0
06 Sep 2022
Transformers with Learnable Activation Functions
Haishuo Fang
Ji-Ung Lee
N. Moosavi
Iryna Gurevych
AI4CE
25
7
0
30 Aug 2022
DiVa: An Accelerator for Differentially Private Machine Learning
Beom-Joo Park
Ranggi Hwang
Dongho Yoon
Yoonhyuk Choi
Minsoo Rhu
24
8
0
26 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
18
62
0
20 Aug 2022
Domain-Specific Text Generation for Machine Translation
Yasmin Moslem
Rejwanul Haque
John D. Kelleher
Andy Way
16
16
0
11 Aug 2022
LATTE: LAnguage Trajectory TransformEr
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Sai H. Vemprala
Rogerio Bonatti
LM&Ro
31
59
0
04 Aug 2022
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
Ziyi Wang
Xumin Yu
Yongming Rao
Jie Zhou
Jiwen Lu
VPVLM
VLM
21
75
0
04 Aug 2022
giMLPs: Gate with Inhibition Mechanism in MLPs
Cheng Kang
Jindich Prokop
Lei Tong
Huiyu Zhou
Yong Hu
Daneil Novak
21
0
0
01 Aug 2022
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMe
AI4CE
LRM
13
3
0
25 Jul 2022
Can large language models reason about medical questions?
Valentin Liévin
C. Hother
Andreas Geert Motzfeldt
Ole Winther
ELM
LM&MA
AI4MH
LRM
24
299
0
17 Jul 2022
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Reza Yazdani Aminabadi
Samyam Rajbhandari
Minjia Zhang
A. A. Awan
Cheng-rong Li
...
Elton Zheng
Jeff Rasley
Shaden Smith
Olatunji Ruwase
Yuxiong He
29
335
0
30 Jun 2022
Solving Quantitative Reasoning Problems with Language Models
Aitor Lewkowycz
Anders Andreassen
David Dohan
Ethan Dyer
Henryk Michalewski
...
Theo Gutman-Solo
Yuhuai Wu
Behnam Neyshabur
Guy Gur-Ari
Vedant Misra
ReLM
ELM
LRM
58
739
0
29 Jun 2022
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
Karthik Valmeekam
Matthew Marquez
Alberto Olmo
S. Sreedharan
Subbarao Kambhampati
ReLM
LRM
25
197
0
21 Jun 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
15
73
0
20 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
51
392
0
17 Jun 2022
Towards Understanding How Machines Can Learn Causal Overhypotheses
Eliza Kosoy
David M. Chan
Adrian Liu
Jasmine Collins
Bryanna Kaufmann
Sandy Han Huang
Jessica B. Hamrick
John F. Canny
Nan Rosemary Ke
Alison Gopnik
CML
AI4CE
26
18
0
16 Jun 2022
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems
Jack G. M. FitzGerald
Shankar Ananthakrishnan
Konstantine Arkoudas
Davide Bernardi
Abhishek Bhagia
...
Pan Wei
Haiyang Yu
Shuai Zheng
Gökhan Tür
Premkumar Natarajan
ELM
6
30
0
15 Jun 2022
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
27
95
0
13 Jun 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
22
39
0
10 Jun 2022
On Data Scaling in Masked Image Modeling
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Yutong Lin
Yixuan Wei
Qi Dai
Han Hu
29
52
0
09 Jun 2022
Unveiling Transformers with LEGO: a synthetic reasoning task
Yi Zhang
A. Backurs
Sébastien Bubeck
Ronen Eldan
Suriya Gunasekar
Tal Wagner
LRM
28
85
0
09 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
45
441
0
04 Jun 2022
Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Xiaoxia Wu
Z. Yao
Minjia Zhang
Conglong Li
Yuxiong He
MQ
19
31
0
04 Jun 2022
Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code
Patrick Bareiss
Beatriz Souza
Marcelo d’Amorim
Michael Pradel
ELM
16
76
0
02 Jun 2022
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan
Yongjun He
Jared Davis
Tianyi Zhang
Tri Dao
Beidi Chen
Percy Liang
Christopher Ré
Ce Zhang
25
90
0
02 Jun 2022
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
R. Liu
Young Jin Kim
Alexandre Muzio
Hany Awadalla
MoE
47
22
0
28 May 2022
Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations
Kang Min Yoo
Junyeob Kim
Hyuhng Joon Kim
Hyunsoo Cho
Hwiyeol Jo
Sang-Woo Lee
Sang-goo Lee
Taeuk Kim
23
123
0
25 May 2022
Fine-tuned Language Models are Continual Learners
Thomas Scialom
Tuhin Chakrabarty
Smaranda Muresan
CLL
LRM
145
117
0
24 May 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
322
4,077
0
24 May 2022
On the Role of Bidirectionality in Language Model Pre-Training
Mikel Artetxe
Jingfei Du
Naman Goyal
Luke Zettlemoyer
Ves Stoyanov
22
16
0
24 May 2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
Conrad Borchers
Dalia Sara Gala
Ben Gilburt
Eduard Oravkin
Wilfried Bounsi
Yuki M. Asano
Hannah Rose Kirk
AI4CE
19
27
0
23 May 2022
Previous
1
2
3
...
10
11
8
9
Next