Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning

26 August 2023

Papers citing "Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning"

3 / 3 papers shown

Title
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 119 7,386 0 02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 56 1,838 0 23 Sep 2019
Neural Network Acceptability Judgments Alex Warstadt Amanpreet Singh Samuel R. Bowman 169 1,390 0 31 May 2018