Knowledge Distillation Transfer Sets and their Impact on Downstream NLU Tasks

10 October 2022

Papers citing "Knowledge Distillation Transfer Sets and their Impact on Downstream NLU Tasks"

5 / 5 papers shown

Title
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks Suchin Gururangan Ana Marasović Swabha Swayamdipta Kyle Lo Iz Beltagy Doug Downey Noah A. Smith VLM AI4CE CLL 134 2,420 0 23 Apr 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 198 7,481 0 02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 92 1,857 0 23 Sep 2019
Learning both Weights and Connections for Efficient Neural Networks Song Han Jeff Pool J. Tran W. Dally CVBM 287 6,660 0 08 Jun 2015
FitNets: Hints for Thin Deep Nets Adriana Romero Nicolas Ballas Samira Ebrahimi Kahou Antoine Chassang C. Gatta Yoshua Bengio FedML 278 3,870 0 19 Dec 2014