Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.07808
Cited By
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems
15 June 2022
Jack G. M. FitzGerald
Shankar Ananthakrishnan
Konstantine Arkoudas
Davide Bernardi
Abhishek Bhagia
Claudio Delli Bovi
Jin Cao
Rakesh Chada
Amit Chauhan
Luoxin Chen
Anurag Dwarakanath
Satyam Dwivedi
Turan Gojayev
Karthik Gopalakrishnan
Thomas Gueudré
Dilek Z. Hakkani-Tür
Wael Hamza
Jonathan Hueser
Kevin Martin Jose
Haidar Khan
Bei Liu
Jianhua Lu
A. Manzotti
P. Natarajan
Karolina Owczarzak
Gokmen Oz
Enrico Palumbo
Charith Peris
Chandan Prakash
Stephen Rawls
Andrew Rosenbaum
Anjali Shenoy
Saleh Soltan
Mukund Sridhar
Lizhen Tan
Fabian Triefenbach
Pan Wei
Haiyang Yu
Shuai Zheng
Gokhan Tur
Premkumar Natarajan
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems"
6 / 6 papers shown
Title
Knowledge Distillation Transfer Sets and their Impact on Downstream NLU Tasks
Charith Peris
Lizhen Tan
Thomas Gueudré
Turan Gojayev
Vivi Wei
Gokmen Oz
28
4
0
10 Oct 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gokhan Tur
Premkumar Natarajan
54
82
0
02 Aug 2022
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
98
84
0
22 Sep 2021
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
210
1,654
0
16 Mar 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
249
4,489
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1