Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18239
Cited By
A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models
26 May 2023
Hayeon Lee
Rui Hou
Jongpil Kim
Davis Liang
Sung Ju Hwang
Alexander Min
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models"
14 / 14 papers shown
Title
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning
Ruimeng Ye
Yang Xiao
Bo Hui
ALM
ELM
OffRL
109
4
0
16 Oct 2024
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
Yuxuan Yao
Han Wu
Mingyang Liu
Sichun Luo
Xiongwei Han
Jie Liu
Zhijiang Guo
Linqi Song
96
7
0
03 Oct 2024
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
103
270
0
31 Dec 2020
FNA++: Fast Network Adaptation via Parameter Remapping and Architecture Search
Jiemin Fang
Yuzhu Sun
Qian Zhang
Kangjian Peng
Yuan Li
Wenyu Liu
Xinggang Wang
SSeg
100
34
0
21 Jun 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
179
1,282
0
25 Feb 2020
Fast Neural Network Adaptation via Parameter Remapping and Architecture Search
Jiemin Fang
Yuzhu Sun
Kangjian Peng
Qian Zhang
Yuan Li
Wenyu Liu
Xinggang Wang
SSeg
49
34
0
08 Jan 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
255
7,554
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
113
1,872
0
23 Sep 2019
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
244
1,413
0
31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,201
0
20 Apr 2018
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel Cer
Mona T. Diab
Eneko Agirre
I. Lopez-Gazpio
Lucia Specia
445
1,891
0
31 Jul 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
524
4,497
0
18 Apr 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
316
8,174
0
16 Jun 2016
Net2Net: Accelerating Learning via Knowledge Transfer
Tianqi Chen
Ian Goodfellow
Jonathon Shlens
187
672
0
18 Nov 2015
1