MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language
Models

MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models

3 July 2024

ArXiv (abs)PDF HTML

Papers citing "MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models"

17 / 17 papers shown

Title
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Wenhui Wang Hangbo Bao Shaohan Huang Li Dong Furu Wei MQ 96 270 0 31 Dec 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices Zhiqing Sun Hongkun Yu Xiaodan Song Renjie Liu Yiming Yang Denny Zhou MQ 115 817 0 06 Apr 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang Ming Zhou VLM 176 1,282 0 25 Feb 2020
Knowledge Distillation from Internal Representations Gustavo Aguilar Yuan Ling Yu Zhang Benjamin Yao Xing Fan Edward Guo 80 181 0 08 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 255 7,547 0 02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 113 1,869 0 23 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 686 24,557 0 26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding Zhilin Yang Zihang Dai Yiming Yang J. Carbonell Ruslan Salakhutdinov Quoc V. Le AI4CE 236 8,451 0 19 Jun 2019
ERNIE: Enhanced Language Representation with Informative Entities Zhengyan Zhang Xu Han Zhiyuan Liu Xin Jiang Maosong Sun Qun Liu 113 1,402 0 17 May 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova Jimmy J. Lin 75 421 0 28 Mar 2019
Know What You Don't Know: Unanswerable Questions for SQuAD Pranav Rajpurkar Robin Jia Percy Liang RALM ELM 292 2,853 0 11 Jun 2018
Neural Network Acceptability Judgments Alex Warstadt Amanpreet Singh Samuel R. Bowman 242 1,413 0 31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,200 0 20 Apr 2018
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation Daniel Cer Mona T. Diab Eneko Agirre I. Lopez-Gazpio Lucia Specia 445 1,891 0 31 Jul 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference Adina Williams Nikita Nangia Samuel R. Bowman 524 4,497 0 18 Apr 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 316 8,174 0 16 Jun 2016
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 402 33,565 0 16 Oct 2013

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.