Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

3 January 2022

Papers citing "Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models"

6 / 6 papers shown

Title
On Multilingual Encoder Language Model Compression for Low-Resource Languages Daniil Gurgurov Michal Gregor Josef van Genabith Simon Ostermann 190 0 0 22 May 2025
Small Language Models in the Real World: Insights from Industrial Text Classification Lujun Li Lama Sleem Niccolo Gentile Geoffrey Nichil Radu State LLMAG 218 0 0 21 May 2025
The Privileged Students: On the Value of Initialization in Multilingual Knowledge Distillation Haryo Akbarianto Wibowo Thamar Solorio Alham Fikri Aji 78 3 0 24 Jun 2024
Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service Mirza Alim Mutasodirin Radityo Eko Prasojo Achmad F. Abka Hanif Rasyidi VLM 58 0 0 19 Mar 2024
Improving Neural Topic Models with Wasserstein Knowledge Distillation Suman Adhya Debarshi Kumar Sanyal BDL 109 1 0 27 Mar 2023
Xception: Deep Learning with Depthwise Separable Convolutions François Chollet MDE BDL PINN 1.6K 14,698 0 07 Oct 2016