Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04719
Cited By
Don't trust your eyes: on the (un)reliability of feature visualizations
7 June 2023
Robert Geirhos
Roland S. Zimmermann
Blair Bilodeau
Wieland Brendel
Been Kim
FAtt
OOD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Don't trust your eyes: on the (un)reliability of feature visualizations"
23 / 23 papers shown
Title
Probing the Vulnerability of Large Language Models to Polysemantic Interventions
Bofan Gong
Shiyang Lai
Dawn Song
AAML
MILM
4
0
0
16 May 2025
Decoding Vision Transformers: the Diffusion Steering Lens
Ryota Takatsuki
Sonia Joseph
Ippei Fujisawa
Ryota Kanai
DiffM
30
0
0
18 Apr 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models
Samuel Stevens
Wei-Lun Chao
T. Berger-Wolf
Yu-Chuan Su
VLM
74
2
0
10 Feb 2025
Dimensions underlying the representational alignment of deep neural networks with humans
F. Mahner
Lukas Muttenthaler
Umut Güçlü
M. Hebart
48
4
0
28 Jan 2025
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations
Laura O'Mahony
Nikola S. Nikolov
David JP O'Sullivan
38
0
0
15 Nov 2024
From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
Géraldin Nanfack
Michael Eickenberg
Eugene Belilovsky
FAtt
AAML
GNN
43
0
0
03 Jun 2024
CoSy: Evaluating Textual Explanations of Neurons
Laura Kopf
P. Bommer
Anna Hedström
Sebastian Lapuschkin
Marina M.-C. Höhne
Kirill Bykov
44
7
0
30 May 2024
Interpretability Needs a New Paradigm
Andreas Madsen
Himabindu Lakkaraju
Siva Reddy
Sarath Chandar
39
4
0
08 May 2024
Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey
Rokas Gipiškis
Chun-Wei Tsai
Olga Kurasova
63
5
0
02 May 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Usha Bhalla
Alexander X. Oesterling
Suraj Srinivas
Flavio du Pin Calmon
Himabindu Lakkaraju
44
36
0
16 Feb 2024
Feature Accentuation: Revealing 'What' Features Respond to in Natural Images
Christopher Hamblin
Thomas Fel
Srijani Saha
Talia Konkle
George A. Alvarez
FAtt
31
3
0
15 Feb 2024
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva
Marina M.-C. Höhne
Alexander Warnecke
Lukas Pirch
Klaus-Robert Müller
Konrad Rieck
Kirill Bykov
AAML
40
6
0
11 Jan 2024
Labeling Neural Representations with Inverse Recognition
Kirill Bykov
Laura Kopf
Shinichi Nakajima
Marius Kloft
Marina M.-C. Höhne
BDL
34
15
0
22 Nov 2023
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?
Zachariah Carmichael
Walter J. Scheirer
FAtt
44
4
0
27 Oct 2023
Getting aligned on representational alignment
Ilia Sucholutsky
Lukas Muttenthaler
Adrian Weller
Andi Peng
Andreea Bobu
...
Thomas Unterthiner
Andrew Kyle Lampinen
Klaus-Robert Muller
M. Toneva
Thomas Griffiths
66
75
0
18 Oct 2023
SPADE: Sparsity-Guided Debugging for Deep Neural Networks
Arshia Soltani Moakhar
Eugenia Iofinova
Elias Frantar
Dan Alistarh
40
1
0
06 Oct 2023
The Blame Problem in Evaluating Local Explanations, and How to Tackle it
Amir Hossein Akhavan Rahnama
ELM
FAtt
32
4
0
05 Oct 2023
Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability
Ziyin Li
Bao Feng
23
1
0
29 Sep 2023
Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
Roland S. Zimmermann
Thomas Klein
Wieland Brendel
36
13
0
11 Jul 2023
Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained Optimization
Thomas Fel
Thibaut Boissin
Victor Boutin
Agustin Picard
Paul Novello
...
Drew Linsley
Tom Rousseau
Rémi Cadène
Laurent Gardes
Thomas Serre
FAtt
21
19
0
11 Jun 2023
Warning: Humans Cannot Reliably Detect Speech Deepfakes
Kimberly T. Mai
Sergi D. Bray
Toby O. Davies
Lewis D. Griffin
41
40
0
19 Jan 2023
Attribution-based Explanations that Provide Recourse Cannot be Robust
H. Fokkema
R. D. Heide
T. Erven
FAtt
47
18
0
31 May 2022
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
257
3,690
0
28 Feb 2017
1