Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.01108
Cited By
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
2 October 2019
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"
31 / 131 papers shown
Title
AutoMix: Automatically Mixing Language Models
Pranjal Aggarwal
Aman Madaan
Ankit Anand
Srividya Pranavi Potharaju
Swaroop Mishra
...
Karthik Kappaganthu
Yiming Yang
Shyam Upadhyay
Manaal Faruqui
Mausam
75
22
0
19 Oct 2023
An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records
Fabio Azzalini
Tommaso Dolci
Marco Vagaggini
OOD
121
1
0
16 Oct 2023
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
Soheil Feizi
Himabindu Lakkaraju
AAML
47
186
0
06 Sep 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature
Walter Hernandez Cruz
K. Tylinski
Alastair Moore
Niall Roche
Nikhil Vadgama
Horst Treiblmaier
J. Shangguan
Paolo Tasca
Jiahua Xu
55
2
0
23 Aug 2023
UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition
Aidan Mannion
Thierry Chevalier
D. Schwab
Lorraine Goeuriot
MedIm
67
3
0
20 Jul 2023
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning
Haiwei Wu
Jiantao Zhou
Shile Zhang
139
30
0
23 May 2023
Curating corpora with classifiers: A case study of clean energy sentiment online
M. V. Arnold
P. Dodds
C. Danforth
61
0
0
04 May 2023
Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR
Juri Opitz
Anette Frank
100
35
0
20 Aug 2020
Ensemble Distillation for Robust Model Fusion in Federated Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
FedML
62
1,031
0
12 Jun 2020
Efficient Intent Detection with Dual Sentence Encoders
I. Casanueva
Tadas Temvcinas
D. Gerz
Matthew Henderson
Ivan Vulić
VLM
282
466
0
10 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
27
94
0
04 Mar 2020
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
62
1,847
0
23 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
290
1,861
0
17 Sep 2019
Small and Practical BERT Models for Sequence Labeling
Henry Tsai
Jason Riesa
Melvin Johnson
N. Arivazhagan
Xin Li
Amelia Archer
VLM
36
121
0
31 Aug 2019
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Iulia Turc
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
49
224
0
23 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
183
8,386
0
19 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
53
2,633
0
05 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
76
1,051
0
25 May 2019
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System
Ze Yang
Linjun Shou
Ming Gong
Wutao Lin
Daxin Jiang
KELM
18
20
0
21 Apr 2019
Making Neural Machine Reading Comprehension Faster
Debajyoti Chatterjee
AIMat
40
9
0
29 Mar 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
54
419
0
28 Mar 2019
Improved Knowledge Distillation via Teacher Assistant
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Ang Li
Nir Levine
Akihiro Matsukawa
H. Ghasemzadeh
79
1,073
0
09 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
961
93,936
0
11 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
658
7,080
0
20 Apr 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
111
11,520
0
15 Feb 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
153
8,067
0
16 Jun 2016
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Yukun Zhu
Ryan Kiros
R. Zemel
Ruslan Salakhutdinov
R. Urtasun
Antonio Torralba
Sanja Fidler
105
2,529
0
22 Jun 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
241
19,523
0
09 Mar 2015
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
134
2,043
0
09 Feb 2015
Previous
1
2
3