ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.13727
  4. Cited By
Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs

Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs

16 June 2025
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Reduan Achtibat
Patrick Kahardipraja
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
ArXiv (abs)PDFHTML

Papers citing "Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs"

17 / 17 papers shown
Title
Compact Language Models via Pruning and Knowledge Distillation
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDaMQ
86
54
0
19 Jul 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
133
158
0
28 Mar 2024
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for
  Transformers
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Aakriti Jain
Thomas Wiegand
Sebastian Lapuschkin
Wojciech Samek
64
37
0
08 Feb 2024
TinyLlama: An Open-Source Small Language Model
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALMLRM
145
406
0
04 Jan 2024
Attribution Patching Outperforms Automated Circuit Discovery
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
Can Rager
Arthur Conmy
133
67
0
16 Oct 2023
From Hope to Safety: Unlearning Biases of Deep Models via Gradient
  Penalization in Latent Space
From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space
Maximilian Dreyer
Frederik Pahde
Christopher J. Anders
Wojciech Samek
Sebastian Lapuschkin
AI4CE
55
11
0
18 Aug 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
64
318
0
28 Apr 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
883
13,176
0
04 Mar 2022
Knowledge Neurons in Pretrained Transformers
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELMMU
94
463
0
18 Apr 2021
The elephant in the interpretability room: Why use attention as
  explanation when we have saliency methods?
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?
Jasmijn Bastings
Katja Filippova
XAILRM
95
178
0
12 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
163
1,214
0
24 Sep 2020
Null It Out: Guarding Protected Attributes by Iterative Nullspace
  Projection
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
138
388
0
16 Apr 2020
Making deep neural networks right for the right scientific reasons by
  interacting with their explanations
Making deep neural networks right for the right scientific reasons by interacting with their explanations
P. Schramowski
Wolfgang Stammer
Stefano Teso
Anna Brugger
Xiaoting Shao
Hans-Georg Luigs
Anne-Katrin Mahlein
Kristian Kersting
104
213
0
15 Jan 2020
Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning
Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning
Seul-Ki Yeom
P. Seegerer
Sebastian Lapuschkin
Alexander Binder
Simon Wiedemann
K. Müller
Wojciech Samek
CVBM
65
209
0
18 Dec 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
459
20,317
0
23 Oct 2019
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,363
0
12 Jun 2017
Axiomatic Attribution for Deep Networks
Axiomatic Attribution for Deep Networks
Mukund Sundararajan
Ankur Taly
Qiqi Yan
OODFAtt
193
6,018
0
04 Mar 2017
1