XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

8 June 2021

Papers citing "XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation"

30 / 30 papers shown

Title
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding Hao Fu Shaojun Zhou Qihong Yang Junjie Tang Guiquan Liu Kaikui Liu Xiaolong Li 87 60 0 14 Dec 2020
Self-training Improves Pre-training for Natural Language Understanding Jingfei Du Edouard Grave Beliz Gunel Vishrav Chaudhary Onur Çelebi Michael Auli Ves Stoyanov Alexis Conneau VLM LRM SSL 45 164 0 05 Oct 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 783 42,055 0 28 May 2020
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models Subhabrata Mukherjee Ahmed Hassan Awadallah 53 58 0 12 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices Zhiqing Sun Hongkun Yu Xiaodan Song Renjie Liu Yiming Yang Denny Zhou MQ 109 816 0 06 Apr 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang Ming Zhou VLM 145 1,267 0 25 Feb 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning Mitchell A. Gordon Kevin Duh Nicholas Andrews VLM 53 340 0 19 Feb 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 427 20,181 0 23 Oct 2019
Knowledge Distillation from Internal Representations Gustavo Aguilar Yuan Ling Yu Zhang Benjamin Yao Xing Fan Edward Guo 76 182 0 08 Oct 2019
Extremely Small BERT Models from Mixed-Vocabulary Training Sanqiang Zhao Raghav Gupta Yang Song Denny Zhou VLM 49 53 0 25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 105 1,860 0 23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression S. Sun Yu Cheng Zhe Gan Jingjing Liu 132 837 0 25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 653 24,464 0 26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding Zhilin Yang Zihang Dai Yiming Yang J. Carbonell Ruslan Salakhutdinov Quoc V. Le AI4CE 230 8,433 0 19 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP Emma Strubell Ananya Ganesh Andrew McCallum 66 2,657 0 05 Jun 2019
Unsupervised Data Augmentation for Consistency Training Qizhe Xie Zihang Dai Eduard H. Hovy Minh-Thang Luong Quoc V. Le 135 2,316 0 29 Apr 2019
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding Xiaodong Liu Pengcheng He Weizhu Chen Jianfeng Gao FedML 54 182 0 20 Apr 2019
PAWS: Paraphrase Adversaries from Word Scrambling Yuan Zhang Jason Baldridge Luheng He 71 543 0 01 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova Jimmy J. Lin 66 421 0 28 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 94,891 0 11 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,159 0 20 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle Michael Carbin 233 3,473 0 09 Mar 2018
ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations John Wieting Kevin Gimpel LRM 68 149 0 15 Nov 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference Adina Williams Nikita Nangia Samuel R. Bowman 520 4,479 0 18 Apr 2017
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhiwen Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 897 6,790 0 26 Sep 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 283 8,134 0 16 Jun 2016
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding Song Han Huizi Mao W. Dally 3DGS 257 8,842 0 01 Oct 2015
A large annotated corpus for learning natural language inference Samuel R. Bowman Gabor Angeli Christopher Potts Christopher D. Manning 313 4,287 0 21 Aug 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 362 19,660 0 09 Mar 2015
Compressing Deep Convolutional Networks using Vector Quantization Yunchao Gong Liu Liu Ming Yang Lubomir D. Bourdev MQ 162 1,170 0 18 Dec 2014