Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08962
Cited By
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
23 August 2019
Iulia Turc
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Well-Read Students Learn Better: On the Importance of Pre-training Compact Models"
50 / 51 papers shown
Title
To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay
Soumik Dey
Hansi Wu
Binbin Li
50
0
0
07 May 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
68
0
0
26 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
48
0
0
29 Mar 2025
Banyan: Improved Representation Learning with Explicit Structure
Mattia Opper
N. Siddharth
36
1
0
25 Jul 2024
DE
3
^3
3
-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Jianing He
Qi Zhang
Weiping Ding
Duoqian Miao
Jun Zhao
Liang Hu
LongBing Cao
40
3
0
03 Feb 2024
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
47
0
0
16 Nov 2023
Towards Comparable Knowledge Distillation in Semantic Image Segmentation
Onno Niemann
Christopher Vox
Thorben Werner
VLM
25
1
0
07 Sep 2023
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
...
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng Zhang
Jie Tang
ALM
MoE
33
11
0
11 Jun 2023
LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers
Xuanqing Liu
Zhuotao Liu
19
22
0
28 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
Web Content Filtering through knowledge distillation of Large Language Models
Tamás Vörös
Sean P. Bergeron
Konstantin Berlin
35
7
0
08 May 2023
EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms
Shikhar Tuli
N. Jha
36
3
0
24 Mar 2023
Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study
Mingxu Tao
Yansong Feng
Dongyan Zhao
CLL
KELM
34
10
0
02 Mar 2023
AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers
Shikhar Tuli
N. Jha
36
32
0
28 Feb 2023
AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
Zhiying Xu
H. Peng
Wei Wang
GNN
26
3
0
02 Dec 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
28
5
0
02 Nov 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
52
4
0
27 Oct 2022
An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification
Ilias Chalkidis
Xiang Dai
Manos Fergadiotis
Prodromos Malakasiotis
Desmond Elliott
44
34
0
11 Oct 2022
Composable Text Controls in Latent Space with ODEs
Guangyi Liu
Zeyu Feng
Yuan Gao
Zichao Yang
Xiaodan Liang
Junwei Bao
Xiaodong He
Shuguang Cui
Zhen Li
Zhiting Hu
AI4CE
DiffM
39
32
0
01 Aug 2022
Knowledge Distillation of Transformer-based Language Models Revisited
Chengqiang Lu
Jianwei Zhang
Yunfei Chu
Zhengyu Chen
Jingren Zhou
Fei Wu
Haiqing Chen
Hongxia Yang
VLM
27
10
0
29 Jun 2022
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
42
11
0
07 Jun 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
81
42
0
23 May 2022
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
22
21
0
10 May 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Christopher Hidey
Fei Liu
Rahul Goel
32
4
0
10 Apr 2022
Deep Learning for Hate Speech Detection: A Comparative Study
Jitendra Malik
Hezhe Qiao
Guansong Pang
Anton Van Den Hengel
51
44
0
19 Feb 2022
Learning to Generalize Compositionally by Transferring Across Semantic Parsing Tasks
Wang Zhu
Peter Shaw
Tal Linzen
Fei Sha
35
7
0
09 Nov 2021
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu
Tianxiang Sun
Junliang He
Jiawen Wu
Lingling Wu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
ELM
28
46
0
13 Oct 2021
Learning Compact Metrics for MT
Amy Pu
Hyung Won Chung
Ankur P. Parikh
Sebastian Gehrmann
Thibault Sellam
38
99
0
12 Oct 2021
Dynamic Knowledge Distillation for Pre-trained Language Models
Lei Li
Yankai Lin
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
28
49
0
23 Sep 2021
Block Pruning For Faster Transformers
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
33
219
0
10 Sep 2021
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression
Canwen Xu
Wangchunshu Zhou
Tao Ge
Kelvin J. Xu
Julian McAuley
Furu Wei
21
41
0
07 Sep 2021
DKM: Differentiable K-Means Clustering Layer for Neural Network Compression
Minsik Cho
Keivan Alizadeh Vahid
Saurabh N. Adya
Mohammad Rastegari
42
34
0
28 Aug 2021
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
Shauli Ravfogel
Grusha Prasad
Tal Linzen
Yoav Goldberg
36
57
0
14 May 2021
Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media
Ananya Srivastava
Md Musleh Uddin Hasan
Bhargav D. Yagnik
Rahee Walambe
K. Kotecha
29
7
0
11 May 2021
ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques
Yuanxin Liu
Zheng Lin
Fengcheng Yuan
VLM
MQ
10
18
0
21 Mar 2021
Quantization-Guided Training for Compact TinyML Models
Sedigh Ghamari
Koray Ozcan
Thu Dinh
A. Melnikov
Juan Carvajal
Jan Ernst
S. Chai
MQ
21
16
0
10 Mar 2021
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
30
257
0
31 Dec 2020
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
Lei Li
Yankai Lin
Deli Chen
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
29
51
0
29 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
54
58
0
14 Dec 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
46
30
0
07 Nov 2020
TopicBERT for Energy Efficient Document Classification
Yatin Chaudhary
Pankaj Gupta
Khushbu Saxena
Vivek Kulkarni
Thomas Runkler
Hinrich Schütze
24
21
0
15 Oct 2020
Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks
Trapit Bansal
Rishikesh Jha
Tsendsuren Munkhdalai
Andrew McCallum
SSL
VLM
33
87
0
17 Sep 2020
BERT-QE: Contextualized Query Expansion for Document Re-ranking
Zhi Zheng
Kai Hui
Xianpei Han
Xianpei Han
Le Sun
Andrew Yates
27
93
0
15 Sep 2020
Compression of Deep Learning Models for Text: A Survey
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
22
115
0
12 Aug 2020
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
28
2,857
0
09 Jun 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
32
472
0
15 May 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
246
1,454
0
18 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
47
1,214
0
25 Feb 2020
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
115
6,380
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
43
585
0
25 Sep 2019
1
2
Next