v1v2 (latest)

Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm

18 April 2021

Dongkuan Xu

Papers citing "Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm"

25 / 25 papers shown

Title
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers Shuzhou Yuan Ercong Nie Bolei Ma Michael Farber 91 3 0 18 Feb 2024
Comparing Rewinding and Fine-tuning in Neural Network Pruning Alex Renda Jonathan Frankle Michael Carbin 304 388 0 05 Mar 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT Prakhar Ganesh Yao Chen Xin Lou Mohammad Ali Khan Yifan Yang Hassan Sajjad Preslav Nakov Deming Chen Marianne Winslett AI4CE 117 201 0 27 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang Ming Zhou VLM 184 1,284 0 25 Feb 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning Mitchell A. Gordon Kevin Duh Nicholas Andrews VLM 67 343 0 19 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing Canwen Xu Wangchunshu Zhou Tao Ge Furu Wei Ming Zhou 317 201 0 07 Feb 2020
Fast Sparse ConvNets Erich Elsen Marat Dukhan Trevor Gale Karen Simonyan 174 153 0 21 Nov 2019
Q8BERT: Quantized 8Bit BERT Ofir Zafrir Guy Boudoukh Peter Izsak Moshe Wasserblat MQ 93 506 0 14 Oct 2019
Structured Pruning of Large Language Models Ziheng Wang Jeremy Wohlwend Tao Lei 81 291 0 10 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 267 7,554 0 02 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout Angela Fan Edouard Grave Armand Joulin 120 596 0 25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 113 1,872 0 23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression S. Sun Yu Cheng Zhe Gan Jingjing Liu 142 843 0 25 Aug 2019
Energy and Policy Considerations for Deep Learning in NLP Emma Strubell Ananya Ganesh Andrew McCallum 83 2,659 0 05 Jun 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 112 1,069 0 25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 119 1,149 0 23 May 2019
The State of Sparsity in Deep Neural Networks Trevor Gale Erich Elsen Sara Hooker 165 763 0 25 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,324 0 11 Oct 2018
Know What You Don't Know: Unanswerable Questions for SQuAD Pranav Rajpurkar Robin Jia Percy Liang RALM ELM 292 2,854 0 11 Jun 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,201 0 20 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle Michael Carbin 274 3,489 0 09 Mar 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 233 11,569 0 15 Feb 2018
To prune, or not to prune: exploring the efficacy of pruning for model compression Michael Zhu Suyog Gupta 202 1,282 0 05 Oct 2017
Learning both Weights and Connections for Efficient Neural Networks Song Han Jeff Pool J. Tran W. Dally CVBM 316 6,709 0 08 Jun 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 367 19,745 0 09 Mar 2015