Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09482
Cited By
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
20 April 2019
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
FedML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding"
16 / 16 papers shown
Title
Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality
Rahul Zalkikar
Kanchan Chandra
72
1
0
21 Feb 2024
Multilingual Neural Machine Translation with Knowledge Distillation
Xu Tan
Yi Ren
Di He
Tao Qin
Zhou Zhao
Tie-Yan Liu
60
248
0
27 Feb 2019
Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
AI4CE
98
1,269
0
31 Jan 2019
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
Jason Phang
Thibault Févry
Samuel R. Bowman
67
467
0
02 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
882
93,936
0
11 Oct 2018
Neural Approaches to Conversational AI
Jianfeng Gao
Michel Galley
Lihong Li
74
672
0
21 Sep 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
594
7,080
0
20 Apr 2018
Stochastic Answer Networks for Machine Reading Comprehension
Xiaodong Liu
Yelong Shen
Kevin Duh
Jianfeng Gao
RALM
28
198
0
10 Dec 2017
FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension
Hsin-Yuan Huang
Chenguang Zhu
Yelong Shen
Weizhu Chen
FedML
55
183
0
16 Nov 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
427
129,831
0
12 Jun 2017
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
237
10,412
0
21 Jul 2016
Net2Net: Accelerating Learning via Knowledge Transfer
Tianqi Chen
Ian Goodfellow
Jonathon Shlens
88
663
0
18 Nov 2015
Bayesian Dark Knowledge
Masashi Sugiyama
Vivek Rathod
R. Garnett
Max Welling
BDL
UQCV
48
258
0
14 Jun 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
198
19,448
0
09 Mar 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
736
149,474
0
22 Dec 2014
Natural Language Processing (almost) from Scratch
R. Collobert
Jason Weston
Léon Bottou
Michael Karlen
Koray Kavukcuoglu
Pavel P. Kuksa
121
7,711
0
02 Mar 2011
1