Don't trust your eyes: on the (un)reliability of feature visualizations

7 June 2023

Wieland Brendel

Papers citing "Don't trust your eyes: on the (un)reliability of feature visualizations"

23 / 23 papers shown

Title
Probing the Vulnerability of Large Language Models to Polysemantic Interventions Bofan Gong Shiyang Lai Dawn Song AAML MILM 4 0 0 16 May 2025
Decoding Vision Transformers: the Diffusion Steering Lens Ryota Takatsuki Sonia Joseph Ippei Fujisawa Ryota Kanai DiffM 30 0 0 18 Apr 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models Samuel Stevens Wei-Lun Chao T. Berger-Wolf Yu-Chuan Su VLM 74 2 0 10 Feb 2025
Dimensions underlying the representational alignment of deep neural networks with humans F. Mahner Lukas Muttenthaler Umut Güçlü M. Hebart 48 4 0 28 Jan 2025
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations Laura O'Mahony Nikola S. Nikolov David JP O'Sullivan 38 0 0 15 Nov 2024
From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation Géraldin Nanfack Michael Eickenberg Eugene Belilovsky FAtt AAML GNN 43 0 0 03 Jun 2024
CoSy: Evaluating Textual Explanations of Neurons Laura Kopf P. Bommer Anna Hedström Sebastian Lapuschkin Marina M.-C. Höhne Kirill Bykov 44 7 0 30 May 2024
Interpretability Needs a New Paradigm Andreas Madsen Himabindu Lakkaraju Siva Reddy Sarath Chandar 39 4 0 08 May 2024
Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey Rokas Gipiškis Chun-Wei Tsai Olga Kurasova 63 5 0 02 May 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) Usha Bhalla Alexander X. Oesterling Suraj Srinivas Flavio du Pin Calmon Himabindu Lakkaraju 44 36 0 16 Feb 2024
Feature Accentuation: Revealing 'What' Features Respond to in Natural Images Christopher Hamblin Thomas Fel Srijani Saha Talia Konkle George A. Alvarez FAtt 31 3 0 15 Feb 2024
Manipulating Feature Visualizations with Gradient Slingshots Dilyara Bareeva Marina M.-C. Höhne Alexander Warnecke Lukas Pirch Klaus-Robert Müller Konrad Rieck Kirill Bykov AAML 40 6 0 11 Jan 2024
Labeling Neural Representations with Inverse Recognition Kirill Bykov Laura Kopf Shinichi Nakajima Marius Kloft Marina M.-C. Höhne BDL 34 15 0 22 Nov 2023
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors? Zachariah Carmichael Walter J. Scheirer FAtt 44 4 0 27 Oct 2023
Getting aligned on representational alignment Ilia Sucholutsky Lukas Muttenthaler Adrian Weller Andi Peng Andreea Bobu ... Thomas Unterthiner Andrew Kyle Lampinen Klaus-Robert Muller M. Toneva Thomas Griffiths 66 75 0 18 Oct 2023
SPADE: Sparsity-Guided Debugging for Deep Neural Networks Arshia Soltani Moakhar Eugenia Iofinova Elias Frantar Dan Alistarh 40 1 0 06 Oct 2023
The Blame Problem in Evaluating Local Explanations, and How to Tackle it Amir Hossein Akhavan Rahnama ELM FAtt 32 4 0 05 Oct 2023
Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability Ziyin Li Bao Feng 23 1 0 29 Sep 2023
Scale Alone Does not Improve Mechanistic Interpretability in Vision Models Roland S. Zimmermann Thomas Klein Wieland Brendel 36 13 0 11 Jul 2023
Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained Optimization Thomas Fel Thibaut Boissin Victor Boutin Agustin Picard Paul Novello ... Drew Linsley Tom Rousseau Rémi Cadène Laurent Gardes Thomas Serre FAtt 21 19 0 11 Jun 2023
Warning: Humans Cannot Reliably Detect Speech Deepfakes Kimberly T. Mai Sergi D. Bray Toby O. Davies Lewis D. Griffin 41 40 0 19 Jan 2023
Attribution-based Explanations that Provide Recourse Cannot be Robust H. Fokkema R. D. Heide T. Erven FAtt 47 18 0 31 May 2022
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 257 3,690 0 28 Feb 2017