Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.04657
Cited By
Differentiable Subset Pruning of Transformer Heads
10 August 2021
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Differentiable Subset Pruning of Transformer Heads"
15 / 15 papers shown
Title
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
Pedro Sandoval-Segura
Xijun Wang
Ashwinee Panda
Micah Goldblum
Ronen Basri
Tom Goldstein
David Jacobs
22
0
0
04 Apr 2025
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
55
3
0
11 Nov 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
Amit Dhurandhar
Tejaswini Pedapati
Ronny Luss
Soham Dan
Aurélie C. Lozano
Payel Das
Georgios Kollias
22
3
0
28 Feb 2024
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning
Chong Li
Shaonan Wang
Yunhao Zhang
Jiajun Zhang
Chengqing Zong
38
4
0
16 Oct 2023
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
28
3
0
24 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
24
7
0
16 May 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Token Sparsification for Faster Medical Image Segmentation
Lei Zhou
Huidong Liu
Joseph Bae
Junjun He
Dimitris Samaras
Prateek Prasanna
MedIm
30
3
0
11 Mar 2023
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
Michael Hassid
Hao Peng
Daniel Rotem
Jungo Kasai
Ivan Montero
Noah A. Smith
Roy Schwartz
32
24
0
07 Nov 2022
Probing via Prompting
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
34
13
0
04 Jul 2022
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
9
177
0
01 Apr 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
28
5
0
20 Mar 2022
Auto-Compressing Subset Pruning for Semantic Image Segmentation
Konstantin Ditschuneit
Johannes Otterbach
21
5
0
26 Jan 2022
Pruning Self-attentions into Convolutional Layers in Single Path
Haoyu He
Jianfei Cai
Jing Liu
Zizheng Pan
Jing Zhang
Dacheng Tao
Bohan Zhuang
ViT
34
40
0
23 Nov 2021
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
285
9,138
0
06 Jun 2015
1