Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.05686
Cited By
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
12 April 2020
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Re-assign community
ArXiv
PDF
HTML
Papers citing
"XtremeDistil: Multi-stage Distillation for Massive Multilingual Models"
22 / 22 papers shown
Title
AgentInstruct: Toward Generative Teaching with Agentic Flows
Arindam Mitra
Luciano Del Corro
Guoqing Zheng
Shweti Mahajan
Dany Rouhana
...
Corby Rosset
Fillipe Silva
Hamed Khanpour
Yash Lara
Ahmed Awadallah
SyDa
40
25
0
03 Jul 2024
Efficiently Distilling LLMs for Edge Applications
Achintya Kundu
Fabian Lim
Aaron Chew
L. Wynter
Penny Chong
Rhui Dih Lee
50
6
0
01 Apr 2024
An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation
Md Arafat Sultan
Aashka Trivedi
Parul Awasthy
Avirup Sil
38
0
0
12 Jan 2024
A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training
Nitay Calderon
Subhabrata Mukherjee
Roi Reichart
Amir Kantor
44
17
0
03 May 2023
Distillation of encoder-decoder transformers for sequence labelling
M. Farina
D. Pappadopulo
Anant Gupta
Leslie Huang
Ozan Irsoy
Thamar Solorio
VLM
105
3
0
10 Feb 2023
Friend-training: Learning from Models of Different but Related Tasks
Mian Zhang
Lifeng Jin
Linfeng Song
Haitao Mi
Xiabing Zhou
Dong Yu
VLM
40
0
0
31 Jan 2023
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang
Yanda Chen
Zhou Yu
Kathleen McKeown
27
30
0
20 Dec 2022
Compressing Cross-Lingual Multi-Task Models at Qualtrics
Daniel Fernando Campos
Daniel J. Perry
S. Joshi
Yashmeet Gambhir
Wei Du
Zhengzheng Xing
Aaron Colak
24
1
0
29 Nov 2022
Intriguing Properties of Compression on Multilingual Models
Kelechi Ogueji
Orevaoghene Ahia
Gbemileke Onilude
Sebastian Gehrmann
Sara Hooker
Julia Kreutzer
26
12
0
04 Nov 2022
HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Selim Fekih
Nicolò Tamagnone
Benjamin Minixhofer
R. Shrestha
Ximena Contla
Ewan Oglethorpe
Navid Rekabsaz
21
6
0
10 Oct 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
35
6
0
15 Apr 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
29
6
0
15 Jan 2022
Learning Cross-Lingual IR from an English Retriever
Yulong Li
M. Franz
Md Arafat Sultan
Bhavani Iyer
Young-Suk Lee
Avirup Sil
VLM
22
28
0
15 Dec 2021
MetaQA: Combining Expert Agents for Multi-Skill Question Answering
Haritz Puerto
Gözde Gül Sahin
Iryna Gurevych
LLMAG
33
20
0
03 Dec 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
19
22
0
08 Jun 2021
AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial Discriminator for Cross-Lingual NER
Weile Chen
Huiqiang Jiang
Qianhui Wu
Börje F. Karlsson
Yingjun Guan
21
35
0
04 Jun 2021
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
30
257
0
31 Dec 2020
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
95
142
0
24 Oct 2020
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang
Yong-jia Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
26
10
0
10 Oct 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
47
1,209
0
25 Feb 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,996
0
20 Apr 2018
1