Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs

Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs

16 June 2025

Sayed Mohammad Vakilzadeh Hatefi

Maximilian Dreyer

Reduan Achtibat

Patrick Kahardipraja

Sebastian Lapuschkin

ArXiv (abs)PDF HTML

Papers citing "Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs"

17 / 17 papers shown

Title
Compact Language Models via Pruning and Knowledge Distillation Saurav Muralidharan Sharath Turuvekere Sreenivas Raviraj Joshi Marcin Chochowski M. Patwary Mohammad Shoeybi Bryan Catanzaro Jan Kautz Pavlo Molchanov SyDa MQ 86 54 0 19 Jul 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Samuel Marks Can Rager Eric J. Michaud Yonatan Belinkov David Bau Aaron Mueller 133 158 0 28 Mar 2024
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers Reduan Achtibat Sayed Mohammad Vakilzadeh Hatefi Maximilian Dreyer Aakriti Jain Thomas Wiegand Sebastian Lapuschkin Wojciech Samek 64 37 0 08 Feb 2024
TinyLlama: An Open-Source Small Language Model Peiyuan Zhang Guangtao Zeng Tianduo Wang Wei Lu ALM LRM 145 406 0 04 Jan 2024
Attribution Patching Outperforms Automated Circuit Discovery Aaquib Syed Can Rager Arthur Conmy 133 67 0 16 Oct 2023
From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space Maximilian Dreyer Frederik Pahde Christopher J. Anders Wojciech Samek Sebastian Lapuschkin AI4CE 55 11 0 18 Aug 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability Arthur Conmy Augustine N. Mavor-Parker Aengus Lynch Stefan Heimersheim Adrià Garriga-Alonso 64 318 0 28 Apr 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 883 13,176 0 04 Mar 2022
Knowledge Neurons in Pretrained Transformers Damai Dai Li Dong Y. Hao Zhifang Sui Baobao Chang Furu Wei KELM MU 94 463 0 18 Apr 2021
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Jasmijn Bastings Katja Filippova XAI LRM 95 178 0 12 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models Samuel Gehman Suchin Gururangan Maarten Sap Yejin Choi Noah A. Smith 163 1,214 0 24 Sep 2020
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection Shauli Ravfogel Yanai Elazar Hila Gonen Michael Twiton Yoav Goldberg 138 388 0 16 Apr 2020
Making deep neural networks right for the right scientific reasons by interacting with their explanations P. Schramowski Wolfgang Stammer Stefano Teso Anna Brugger Xiaoting Shao Hans-Georg Luigs Anne-Katrin Mahlein Kristian Kersting 104 213 0 15 Jan 2020
Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning Seul-Ki Yeom P. Seegerer Sebastian Lapuschkin Alexander Binder Simon Wiedemann K. Müller Wojciech Samek CVBM 65 209 0 18 Dec 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 459 20,317 0 23 Oct 2019
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 730 132,363 0 12 Jun 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 193 6,018 0 04 Mar 2017