Transfer Learning for Structured Pruning under Limited Task Data

10 November 2023

Papers citing "Transfer Learning for Structured Pruning under Limited Task Data"

17 / 17 papers shown

Title
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Tim Dettmers M. Lewis Younes Belkada Luke Zettlemoyer MQ 105 662 0 15 Aug 2022
Structured Pruning Learns Compact and Accurate Models Mengzhou Xia Zexuan Zhong Danqi Chen VLM 69 187 0 01 Apr 2022
Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral Lucio Dery Yann N. Dauphin David Grangier MoMe 71 29 0 25 Aug 2021
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks Suchin Gururangan Ana Marasović Swabha Swayamdipta Kyle Lo Iz Beltagy Doug Downey Noah A. Smith VLM AI4CE CLL 164 2,435 0 23 Apr 2020
What's Hidden in a Randomly Weighted Neural Network? Vivek Ramanujan Mitchell Wortsman Aniruddha Kembhavi Ali Farhadi Mohammad Rastegari 66 361 0 29 Nov 2019
Reducing Transformer Depth on Demand with Structured Dropout Angela Fan Edouard Grave Armand Joulin 120 596 0 25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 113 1,869 0 23 Sep 2019
Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform? Xiaolong Ma Sheng Lin Shaokai Ye Zhezhi He Linfeng Zhang ... Deliang Fan Xuehai Qian Xinyu Lin Kaisheng Ma Yanzhi Wang MQ 100 92 0 03 Jul 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 107 1,068 0 25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 117 1,148 0 23 May 2019
Rethinking the Value of Network Pruning Zhuang Liu Mingjie Sun Tinghui Zhou Gao Huang Trevor Darrell 38 1,474 0 11 Oct 2018
Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction Yi Luan Luheng He Mari Ostendorf Hannaneh Hajishirzi 116 684 0 29 Aug 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,200 0 20 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle Michael Carbin 263 3,488 0 09 Mar 2018
PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts Franck Dernoncourt Ji Young Lee 71 230 0 17 Oct 2017
Learning both Weights and Connections for Efficient Neural Networks Song Han Jeff Pool J. Tran W. Dally CVBM 316 6,700 0 08 Jun 2015
The Benefit of Multitask Representation Learning Andreas Maurer Massimiliano Pontil Bernardino Romera-Paredes SSL 109 376 0 23 May 2015