Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.08682
Cited By
v1
v2 (latest)
Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm
18 April 2021
Dongkuan Xu
Ian En-Hsu Yen
Jinxi Zhao
Zhibin Xiao
VLM
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm"
25 / 25 papers shown
Title
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
91
3
0
18 Feb 2024
Comparing Rewinding and Fine-tuning in Neural Network Pruning
Alex Renda
Jonathan Frankle
Michael Carbin
304
388
0
05 Mar 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
117
201
0
27 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
184
1,284
0
25 Feb 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
67
343
0
19 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
317
201
0
07 Feb 2020
Fast Sparse ConvNets
Erich Elsen
Marat Dukhan
Trevor Gale
Karen Simonyan
174
153
0
21 Nov 2019
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
93
506
0
14 Oct 2019
Structured Pruning of Large Language Models
Ziheng Wang
Jeremy Wohlwend
Tao Lei
81
291
0
10 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
267
7,554
0
02 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
120
596
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
113
1,872
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
142
843
0
25 Aug 2019
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
83
2,659
0
05 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
112
1,069
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
119
1,149
0
23 May 2019
The State of Sparsity in Deep Neural Networks
Trevor Gale
Erich Elsen
Sara Hooker
165
763
0
25 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,324
0
11 Oct 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALM
ELM
292
2,854
0
11 Jun 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,201
0
20 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
274
3,489
0
09 Mar 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
233
11,569
0
15 Feb 2018
To prune, or not to prune: exploring the efficacy of pruning for model compression
Michael Zhu
Suyog Gupta
202
1,282
0
05 Oct 2017
Learning both Weights and Connections for Efficient Neural Networks
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
316
6,709
0
08 Jun 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
367
19,745
0
09 Mar 2015
1