Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.13727
Cited By
Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs
16 June 2025
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Reduan Achtibat
Patrick Kahardipraja
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs"
16 / 16 papers shown
Title
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
86
54
0
19 Jul 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
133
158
0
28 Mar 2024
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Aakriti Jain
Thomas Wiegand
Sebastian Lapuschkin
Wojciech Samek
64
37
0
08 Feb 2024
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
145
406
0
04 Jan 2024
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
Can Rager
Arthur Conmy
133
67
0
16 Oct 2023
From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space
Maximilian Dreyer
Frederik Pahde
Christopher J. Anders
Wojciech Samek
Sebastian Lapuschkin
AI4CE
55
11
0
18 Aug 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
883
13,148
0
04 Mar 2022
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
94
463
0
18 Apr 2021
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?
Jasmijn Bastings
Katja Filippova
XAI
LRM
95
178
0
12 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
163
1,214
0
24 Sep 2020
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
138
388
0
16 Apr 2020
Making deep neural networks right for the right scientific reasons by interacting with their explanations
P. Schramowski
Wolfgang Stammer
Stefano Teso
Anna Brugger
Xiaoting Shao
Hans-Georg Luigs
Anne-Katrin Mahlein
Kristian Kersting
104
213
0
15 Jan 2020
Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning
Seul-Ki Yeom
P. Seegerer
Sebastian Lapuschkin
Alexander Binder
Simon Wiedemann
K. Müller
Wojciech Samek
CVBM
65
209
0
18 Dec 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
459
20,298
0
23 Oct 2019
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,363
0
12 Jun 2017
Axiomatic Attribution for Deep Networks
Mukund Sundararajan
Ankur Taly
Qiqi Yan
OOD
FAtt
191
6,018
0
04 Mar 2017
1