ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.12136
  4. Cited By
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

28 March 2019
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
ArXivPDFHTML

Papers citing "Distilling Task-Specific Knowledge from BERT into Simple Neural Networks"

31 / 81 papers shown
Title
Reinforced Multi-Teacher Selection for Knowledge Distillation
Reinforced Multi-Teacher Selection for Knowledge Distillation
Fei Yuan
Linjun Shou
J. Pei
Wutao Lin
Ming Gong
Yan Fu
Daxin Jiang
15
121
0
11 Dec 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform
  for NLP Applications
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Jun Huang
Deng Cai
Wei Lin
VLM
SyDa
36
20
0
18 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
41
30
0
07 Nov 2020
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online
  E-Commerce Search
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search
Yunjiang Jiang
Yue Shang
Ziyang Liu
Hongwei Shen
Yun Xiao
Wei Xiong
Sulong Xu
Weipeng P. Yan
Di Jin
29
17
0
20 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
242
611
0
13 Oct 2020
Load What You Need: Smaller Versions of Multilingual BERT
Load What You Need: Smaller Versions of Multilingual BERT
Amine Abdaoui
Camille Pradel
Grégoire Sigel
47
72
0
12 Oct 2020
Structural Knowledge Distillation: Tractably Distilling Information for
  Structured Predictor
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang
Yong-jia Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
26
10
0
10 Oct 2020
Efficient Transformers: A Survey
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
114
1,102
0
14 Sep 2020
DualDE: Dually Distilling Knowledge Graph Embedding for Faster and
  Cheaper Reasoning
DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning
Yushan Zhu
Wen Zhang
Mingyang Chen
Hui Chen
Xu-Xin Cheng
Wei Zhang
Huajun Chen Zhejiang University
22
15
0
13 Sep 2020
Students Need More Attention: BERT-based AttentionModel for Small Data
  with Application to AutomaticPatient Message Triage
Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage
Shijing Si
Rui Wang
Jedrek Wosik
Hao Zhang
D. Dov
Guoyin Wang
Ricardo Henao
Lawrence Carin
33
24
0
22 Jun 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
19
2,843
0
09 Jun 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
32
468
0
15 May 2020
Detecting Adverse Drug Reactions from Twitter through Domain-Specific
  Preprocessing and BERT Ensembling
Detecting Adverse Drug Reactions from Twitter through Domain-Specific Preprocessing and BERT Ensembling
Amy Breden
L. Moore
23
13
0
11 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
30
183
0
08 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question
  Answering
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
32
66
0
02 May 2020
Distilling Knowledge for Fast Retrieval-based Chat-bots
Distilling Knowledge for Fast Retrieval-based Chat-bots
Amir Vakili Tahami
Kamyar Ghajar
A. Shakery
24
31
0
23 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
38
167
0
16 Apr 2020
Squeezed Deep 6DoF Object Detection Using Knowledge Distillation
Squeezed Deep 6DoF Object Detection Using Knowledge Distillation
H. Felix
Walber M. Rodrigues
David Macêdo
Francisco Simões
Adriano Oliveira
Veronica Teichrieb
Cleber Zanchettin
3DPC
22
9
0
30 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,452
0
18 Mar 2020
A Survey on Contextual Embeddings
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
225
146
0
16 Mar 2020
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language
  Understanding
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang
Peng Xu
Davis Liang
Ajay K. Mishra
Bing Xiang
10
31
0
16 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
47
1,203
0
25 Feb 2020
A Financial Service Chatbot based on Deep Bidirectional Transformers
A Financial Service Chatbot based on Deep Bidirectional Transformers
S. Yu
Yuxin Chen
Hussain Zaidi
25
33
0
17 Feb 2020
Pre-training Tasks for Embedding-based Large-scale Retrieval
Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang
Felix X. Yu
Yin-Wen Chang
Yiming Yang
Sanjiv Kumar
RALM
13
301
0
10 Feb 2020
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
21
196
0
09 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Blockwise Self-Attention for Long Document Understanding
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
11
251
0
07 Nov 2019
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
Subhabrata Mukherjee
Ahmed Hassan Awadallah
18
25
0
04 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
43
584
0
25 Sep 2019
DocBERT: BERT for Document Classification
DocBERT: BERT for Document Classification
Ashutosh Adhikari
Achyudh Ram
Raphael Tang
Jimmy J. Lin
LLMAG
VLM
13
296
0
17 Apr 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Convolutional Neural Networks for Sentence Classification
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
267
13,368
0
25 Aug 2014
Previous
12