ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.09951
  4. Cited By
Optimal ablation for interpretability

Optimal ablation for interpretability

16 September 2024
Maximilian Li
Lucas Janson
    FAtt
ArXiv (abs)PDFHTML

Papers citing "Optimal ablation for interpretability"

44 / 44 papers shown
Title
Finding Transformer Circuits with Edge Pruning
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar
Alexander Wettig
Dan Friedman
Danqi Chen
185
20
0
24 Jun 2024
How to use and interpret activation patching
How to use and interpret activation patching
Stefan Heimersheim
Neel Nanda
73
47
0
23 Apr 2024
Decomposing and Editing Predictions by Modeling Model Computation
Decomposing and Editing Predictions by Modeling Model Computation
Harshay Shah
Andrew Ilyas
Aleksander Madry
KELM
91
17
0
17 Apr 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
133
158
0
28 Mar 2024
Explorations of Self-Repair in Language Models
Explorations of Self-Repair in Language Models
Cody Rushing
Neel Nanda
KELMMILMLRM
53
13
0
23 Feb 2024
Function Vectors in Large Language Models
Function Vectors in Large Language Models
Eric Todd
Millicent Li
Arnab Sen Sharma
Aaron Mueller
Byron C. Wallace
David Bau
55
120
0
23 Oct 2023
Circuit Component Reuse Across Tasks in Transformer Language Models
Circuit Component Reuse Across Tasks in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
76
71
0
12 Oct 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
189
114
0
27 Sep 2023
Linearity of Relation Decoding in Transformer Language Models
Linearity of Relation Decoding in Transformer Language Models
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
75
100
0
17 Aug 2023
LEACE: Perfect linear concept erasure in closed form
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELMMU
86
119
0
06 Jun 2023
Discovering Latent Knowledge in Language Models Without Supervision
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
140
383
0
07 Dec 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
319
524
0
24 Sep 2022
Analyzing Transformers in Embedding Space
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
58
92
0
06 Sep 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of
  Deep Neural Networks
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Tilman Raukur
A. Ho
Stephen Casper
Dylan Hadfield-Menell
AAMLAI4CE
93
133
0
27 Jul 2022
Locating and Editing Factual Associations in GPT
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
248
1,381
0
10 Feb 2022
Natural Language Descriptions of Deep Visual Features
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
302
124
0
26 Jan 2022
The Out-of-Distribution Problem in Explainability and Search Methods for
  Feature Importance Explanations
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
Peter Hase
Harry Xie
Joey Tianyi Zhou
OODDLRMFAtt
85
91
0
01 Jun 2021
Low-Complexity Probing via Finding Subnetworks
Low-Complexity Probing via Finding Subnetworks
Steven Cao
Victor Sanh
Alexander M. Rush
43
54
0
08 Apr 2021
Explaining by Removing: A Unified Framework for Model Explanation
Explaining by Removing: A Unified Framework for Model Explanation
Ian Covert
Scott M. Lundberg
Su-In Lee
FAtt
106
251
0
21 Nov 2020
Interpretation of NLP models through input marginalization
Interpretation of NLP models through input marginalization
Siwon Kim
Jihun Yi
Eunji Kim
Sungroh Yoon
MILMFAtt
78
60
0
27 Oct 2020
Interpreting Graph Neural Networks for NLP With Differentiable Edge
  Masking
Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking
Michael Schlichtkrull
Nicola De Cao
Ivan Titov
AI4CE
86
220
0
01 Oct 2020
Understanding the Role of Individual Units in a Deep Neural Network
Understanding the Role of Individual Units in a Deep Neural Network
David Bau
Jun-Yan Zhu
Hendrik Strobelt
Àgata Lapedriza
Bolei Zhou
Antonio Torralba
GAN
69
452
0
10 Sep 2020
Neuron Shapley: Discovering the Responsible Neurons
Neuron Shapley: Discovering the Responsible Neurons
Amirata Ghorbani
James Zou
FAttTDI
54
113
0
23 Feb 2020
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
Karl Schulz
Leon Sixt
Federico Tombari
Tim Landgraf
FAtt
61
190
0
02 Jan 2020
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation
  Methods
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
Dylan Slack
Sophie Hilgard
Emily Jia
Sameer Singh
Himabindu Lakkaraju
FAttAAMLMLAU
77
821
0
06 Nov 2019
Feature relevance quantification in explainable AI: A causal problem
Feature relevance quantification in explainable AI: A causal problem
Dominik Janzing
Lenon Minorics
Patrick Blobaum
FAttCML
74
282
0
29 Oct 2019
CXPlain: Causal Explanations for Model Interpretation under Uncertainty
CXPlain: Causal Explanations for Model Interpretation under Uncertainty
Patrick Schwab
W. Karlen
FAttCML
120
209
0
27 Oct 2019
Understanding Deep Networks via Extremal Perturbations and Smooth Masks
Understanding Deep Networks via Extremal Perturbations and Smooth Masks
Ruth C. Fong
Mandela Patrick
Andrea Vedaldi
AAML
73
418
0
18 Oct 2019
The emergence of number and syntax units in LSTM language models
The emergence of number and syntax units in LSTM language models
Yair Lakretz
Germán Kruszewski
T. Desbordes
Dieuwke Hupkes
S. Dehaene
Marco Baroni
53
171
0
18 Mar 2019
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
Sara Hooker
D. Erhan
Pieter-Jan Kindermans
Been Kim
FAttUQCV
114
682
0
28 Jun 2018
RISE: Randomized Input Sampling for Explanation of Black-box Models
RISE: Randomized Input Sampling for Explanation of Black-box Models
Vitali Petsiuk
Abir Das
Kate Saenko
FAtt
181
1,176
0
19 Jun 2018
Learning Efficient Convolutional Networks through Network Slimming
Learning Efficient Convolutional Networks through Network Slimming
Zhuang Liu
Jianguo Li
Zhiqiang Shen
Gao Huang
Shoumeng Yan
Changshui Zhang
125
2,424
0
22 Aug 2017
A Unified Approach to Interpreting Model Predictions
A Unified Approach to Interpreting Model Predictions
Scott M. Lundberg
Su-In Lee
FAtt
1.1K
22,002
0
22 May 2017
Real Time Image Saliency for Black Box Classifiers
Real Time Image Saliency for Black Box Classifiers
P. Dabkowski
Y. Gal
70
591
0
22 May 2017
Network Dissection: Quantifying Interpretability of Deep Visual
  Representations
Network Dissection: Quantifying Interpretability of Deep Visual Representations
David Bau
Bolei Zhou
A. Khosla
A. Oliva
Antonio Torralba
MILMFAtt
150
1,523
1
19 Apr 2017
Interpretable Explanations of Black Boxes by Meaningful Perturbation
Interpretable Explanations of Black Boxes by Meaningful Perturbation
Ruth C. Fong
Andrea Vedaldi
FAttAAML
76
1,525
0
11 Apr 2017
Axiomatic Attribution for Deep Networks
Axiomatic Attribution for Deep Networks
Mukund Sundararajan
Ankur Taly
Qiqi Yan
OODFAtt
191
6,015
0
04 Mar 2017
Understanding Neural Networks through Representation Erasure
Understanding Neural Networks through Representation Erasure
Jiwei Li
Will Monroe
Dan Jurafsky
AAMLMILM
91
567
0
24 Dec 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAttFaML
1.2K
17,027
0
16 Feb 2016
Object Detectors Emerge in Deep Scene CNNs
Object Detectors Emerge in Deep Scene CNNs
Bolei Zhou
A. Khosla
Àgata Lapedriza
A. Oliva
Antonio Torralba
ObjD
153
1,283
0
22 Dec 2014
Deep Inside Convolutional Networks: Visualising Image Classification
  Models and Saliency Maps
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan
Andrea Vedaldi
Andrew Zisserman
FAtt
312
7,308
0
20 Dec 2013
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional Networks
Matthew D. Zeiler
Rob Fergus
FAttSSL
595
15,902
0
12 Nov 2013
How to Explain Individual Classification Decisions
How to Explain Individual Classification Decisions
D. Baehrens
T. Schroeter
Stefan Harmeling
M. Kawanabe
K. Hansen
K. Müller
FAtt
137
1,104
0
06 Dec 2009
Variable importance in binary regression trees and forests
Variable importance in binary regression trees and forests
H. Ishwaran
193
386
0
15 Nov 2007
1