ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.07327
  4. Cited By
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language
  Models

Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models

14 December 2021
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
    MoMe
ArXivPDFHTML

Papers citing "Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models"

27 / 27 papers shown
Title
Dynamic Knowledge Distillation for Pre-trained Language Models
Dynamic Knowledge Distillation for Pre-trained Language Models
Lei Li
Yankai Lin
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
70
49
0
23 Sep 2021
One Teacher is Enough? Pre-trained Language Model Distillation from
  Multiple Teachers
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers
Chuhan Wu
Fangzhao Wu
Yongfeng Huang
48
65
0
02 Jun 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
302
384
0
28 Feb 2021
Towards Debiasing Sentence Representations
Towards Debiasing Sentence Representations
Paul Pu Liang
Irene Li
Emily Zheng
Y. Lim
Ruslan Salakhutdinov
Louis-Philippe Morency
70
238
0
16 Jul 2020
Contextualizing Hate Speech Classifiers with Post-hoc Explanation
Contextualizing Hate Speech Classifiers with Post-hoc Explanation
Brendan Kennedy
Xisen Jin
Aida Mostafazadeh Davani
Morteza Dehghani
Xiang Ren
80
141
0
05 May 2020
Revisiting Pre-Trained Models for Chinese Natural Language Processing
Revisiting Pre-Trained Models for Chinese Natural Language Processing
Yiming Cui
Wanxiang Che
Ting Liu
Bing Qin
Shijin Wang
Guoping Hu
77
697
0
29 Apr 2020
Calibration of Pre-trained Transformers
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
280
300
0
17 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
122
1,260
0
25 Feb 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
373
20,053
0
23 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
200
7,481
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
92
1,857
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
121
836
0
25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
522
24,351
0
26 Jul 2019
Green AI
Green AI
Roy Schwartz
Jesse Dodge
Noah A. Smith
Oren Etzioni
102
1,134
0
22 Jul 2019
Knowledge Amalgamation from Heterogeneous Networks by Common Feature
  Learning
Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning
Sihui Luo
Xinchao Wang
Gongfan Fang
Yao Hu
Dapeng Tao
Xiuming Zhang
MoMe
35
47
0
24 Jun 2019
Measuring Bias in Contextualized Word Representations
Measuring Bias in Contextualized Word Representations
Keita Kurita
Nidhi Vyas
Ayush Pareek
A. Black
Yulia Tsvetkov
98
449
0
18 Jun 2019
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in
  Languages with Rich Morphology
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
Ran Zmigrod
Sabrina J. Mielke
Hanna M. Wallach
Ryan Cotterell
61
281
0
11 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
62
2,647
0
05 Jun 2019
Unifying Heterogeneous Classifiers with Distillation
Unifying Heterogeneous Classifiers with Distillation
J. Vongkulbhisal
Phongtharin Vinayavekhin
M. V. Scarzanella
42
51
0
12 Apr 2019
Amalgamating Knowledge towards Comprehensive Classification
Amalgamating Knowledge towards Comprehensive Classification
Chengchao Shen
L. Câlmâc
Mingli Song
Li Sun
Xiuming Zhang
MoMe
55
89
0
07 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.5K
94,511
0
11 Oct 2018
On Calibration of Modern Neural Networks
On Calibration of Modern Neural Networks
Chuan Guo
Geoff Pleiss
Yu Sun
Kilian Q. Weinberger
UQCV
262
5,812
0
14 Jun 2017
A Baseline for Detecting Misclassified and Out-of-Distribution Examples
  in Neural Networks
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
Dan Hendrycks
Kevin Gimpel
UQCV
139
3,441
0
07 Oct 2016
Character-level Convolutional Networks for Text Classification
Character-level Convolutional Networks for Text Classification
Xiang Zhang
Jiaqi Zhao
Yann LeCun
248
6,101
0
04 Sep 2015
Dropout as a Bayesian Approximation: Representing Model Uncertainty in
  Deep Learning
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
690
9,290
0
06 Jun 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
314
19,609
0
09 Mar 2015
FitNets: Hints for Thin Deep Nets
FitNets: Hints for Thin Deep Nets
Adriana Romero
Nicolas Ballas
Samira Ebrahimi Kahou
Antoine Chassang
C. Gatta
Yoshua Bengio
FedML
280
3,870
0
19 Dec 2014
1