Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.04563
Cited By
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
8 June 2021
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation"
30 / 30 papers shown
Title
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
87
60
0
14 Dec 2020
Self-training Improves Pre-training for Natural Language Understanding
Jingfei Du
Edouard Grave
Beliz Gunel
Vishrav Chaudhary
Onur Çelebi
Michael Auli
Ves Stoyanov
Alexis Conneau
VLM
LRM
SSL
45
164
0
05 Oct 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
783
42,055
0
28 May 2020
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
Subhabrata Mukherjee
Ahmed Hassan Awadallah
53
58
0
12 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
109
816
0
06 Apr 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
145
1,267
0
25 Feb 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
53
340
0
19 Feb 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
427
20,181
0
23 Oct 2019
Knowledge Distillation from Internal Representations
Gustavo Aguilar
Yuan Ling
Yu Zhang
Benjamin Yao
Xing Fan
Edward Guo
76
182
0
08 Oct 2019
Extremely Small BERT Models from Mixed-Vocabulary Training
Sanqiang Zhao
Raghav Gupta
Yang Song
Denny Zhou
VLM
49
53
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
105
1,860
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
132
837
0
25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
653
24,464
0
26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
230
8,433
0
19 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
66
2,657
0
05 Jun 2019
Unsupervised Data Augmentation for Consistency Training
Qizhe Xie
Zihang Dai
Eduard H. Hovy
Minh-Thang Luong
Quoc V. Le
135
2,316
0
29 Apr 2019
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
FedML
54
182
0
20 Apr 2019
PAWS: Paraphrase Adversaries from Word Scrambling
Yuan Zhang
Jason Baldridge
Luheng He
71
543
0
01 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
66
421
0
28 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,159
0
20 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
233
3,473
0
09 Mar 2018
ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations
John Wieting
Kevin Gimpel
LRM
68
149
0
15 Nov 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
520
4,479
0
18 Apr 2017
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
897
6,790
0
26 Sep 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
283
8,134
0
16 Jun 2016
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
257
8,842
0
01 Oct 2015
A large annotated corpus for learning natural language inference
Samuel R. Bowman
Gabor Angeli
Christopher Potts
Christopher D. Manning
313
4,287
0
21 Aug 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
362
19,660
0
09 Mar 2015
Compressing Deep Convolutional Networks using Vector Quantization
Yunchao Gong
Liu Liu
Ming Yang
Lubomir D. Bourdev
MQ
162
1,170
0
18 Dec 2014
1