ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.11105
  4. Cited By
Distiller: A Systematic Study of Model Distillation Methods in Natural
  Language Processing

Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing

23 September 2021
Haoyu He
Xingjian Shi
Jonas W. Mueller
Zha Sheng
Mu Li
George Karypis
ArXivPDFHTML

Papers citing "Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing"

10 / 10 papers shown
Title
Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in
  Knowledge Distillation
Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation
Taehyeon Kim
Jaehoon Oh
Nakyil Kim
Sangwook Cho
Se-Young Yun
32
232
0
19 May 2021
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
56
322
0
08 Apr 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
449
4,662
0
23 Jan 2020
A Mutual Information Maximization Perspective of Language Representation
  Learning
A Mutual Information Maximization Perspective of Language Representation Learning
Lingpeng Kong
Cyprien de Masson dÁutume
Wang Ling
Lei Yu
Zihang Dai
Dani Yogatama
SSL
248
167
0
18 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
126
7,437
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
59
1,847
0
23 Sep 2019
On Mutual Information Maximization for Representation Learning
On Mutual Information Maximization for Representation Learning
Michael Tschannen
Josip Djolonga
Paul Kishan Rubenstein
Sylvain Gelly
Mario Lucic
SSL
145
490
0
31 Jul 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
181
2,296
0
02 May 2019
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text
  Classification Tasks
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
Jason W. Wei
Kai Zou
78
1,931
0
31 Jan 2019
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
151
8,067
0
16 Jun 2016
1