Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.13958
Cited By
Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning
26 August 2023
Apoorv Dankar
Adeem Jassani
Kartikaeya Kumar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning"
3 / 3 papers shown
Title
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
119
7,386
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
56
1,838
0
23 Sep 2019
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
169
1,390
0
31 May 2018
1