Network Dissection: Quantifying Interpretability of Deep Visual Representations

19 April 2017

Antonio Torralba

Papers citing "Network Dissection: Quantifying Interpretability of Deep Visual Representations"

50 / 787 papers shown

Title
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers Jingtong Su Julia Kempe Karen Ullrich 16 0 0 20 Jun 2025
Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework Laura Kopf Nils Feldhus Kirill Bykov P. Bommer Anna Hedström Marina M.-C. Höhne Oliver Eberle 30 0 0 18 Jun 2025
NERO: Explainable Out-of-Distribution Detection with Neuron-level Relevance Anju Chhetri Jari Korhonen P. Gyawali Binod Bhattarai OODD 49 0 0 18 Jun 2025
Vision Transformers Don't Need Trained Registers Nick Jiang Amil Dravid Alexei A. Efros Yossi Gandelsman 41 0 0 09 Jun 2025
InverseScope: Scalable Activation Inversion for Interpreting Large Language Models Yifan Luo Zhennan Zhou Bin Dong 22 0 0 09 Jun 2025
CASE: Contrastive Activation for Saliency Estimation Dane Williamson Yangfeng Ji Matthew B. Dwyer FAtt AAML 17 0 0 08 Jun 2025
Evaluating Neuron Explanations: A Unified Framework with Sanity Checks Tuomas P. Oikarinen Ge Yan Tsui-Wei Weng FAtt XAI 53 1 0 06 Jun 2025
Relevance-driven Input Dropout: an Explanation-guided Regularization Technique Shreyas Gururaj Lars Grüne Wojciech Samek Sebastian Lapuschkin Leander Weber 142 0 0 27 May 2025
FeatInv: Spatially resolved mapping from feature space to input space using conditional diffusion models Nils Neukirch Johanna Vielhaben Nils Strodthoff DiffM 73 0 0 27 May 2025
FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks Laines Schmalwasser Niklas Penzel Joachim Denzler Julia Niebling 58 0 0 23 May 2025
Refining Neural Activation Patterns for Layer-Level Concept Discovery in Neural Network-Based Receivers Marko Tuononen Duy Vu Dani Korpi Vesa Starck Ville Hautamäki 159 0 0 21 May 2025
The Spotlight Resonance Method: Resolving the Alignment of Embedded Activations George Bird 76 0 0 09 May 2025
Causal Intervention Framework for Variational Auto Encoder Mechanistic Interpretability Dip Roy CML 36 0 0 06 May 2025
ChannelExplorer: Exploring Class Separability Through Activation Channel Visualization Md Rahat-uz- Zaman Bei Wang Paul Rosen 52 0 0 06 May 2025
Task Reconstruction and Extrapolation for $π_0$ using Text Latent Quanyi Li 105 0 0 06 May 2025
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning Siyi Chen Yimeng Zhang Sijia Liu Q. Qu AAML 430 0 0 30 Apr 2025
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization Emiliano Penaloza Tianyue H. Zhan Laurent Charlin Mateo Espinosa Zarlenga 108 0 0 25 Apr 2025
Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts M. Zarlenga Gabriele Dominici Pietro Barbiero Z. Shams M. Jamnik KELM 488 0 0 24 Apr 2025
Decoding Vision Transformers: the Diffusion Steering Lens Ryota Takatsuki Sonia Joseph Ippei Fujisawa Ryota Kanai DiffM 109 0 0 18 Apr 2025
Towards Spatially-Aware and Optimally Faithful Concept-Based Explanations Shubham Kumar Dwip Dalal Narendra Ahuja 87 0 0 15 Apr 2025
Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning Saif Punjwani Larry Heck LRM 93 0 0 14 Apr 2025
On Background Bias of Post-Hoc Concept Embeddings in Computer Vision DNNs Gesina Schwalbe Georgii Mikriukov Edgar Heinert Stavros Gerolymatos Mert Keser Alois Knoll Matthias Rottmann Annika Mütze 122 0 0 11 Apr 2025
From Colors to Classes: Emergence of Concepts in Vision Transformers Teresa Dorszewski Lenka Tětková Robert Jenssen Lars Kai Hansen Kristoffer Wickstrøm 78 3 0 31 Mar 2025
Towards Human-Understandable Multi-Dimensional Concept Discovery Arne Grobrugge Niklas Kühl G. Satzger Philipp Spitzer 116 0 0 24 Mar 2025
Automated Processing of eXplainable Artificial Intelligence Outputs in Deep Learning Models for Fault Diagnostics of Large Infrastructures Giovanni Floreale Piero Baraldi Enrico Zio Olga Fink 83 0 0 19 Mar 2025
Shape Bias and Robustness Evaluation via Cue Decomposition for Image Classification and Segmentation Edgar Heinert Thomas Gottwald Annika Mütze Matthias Rottmann 147 0 0 16 Mar 2025
Learning Interpretable Logic Rules from Deep Vision Models Chuqin Geng Yuhe Jiang Ziyu Zhao Haolin Ye Zhaoyue Wang X. Si NAI FAtt VLM 114 1 0 13 Mar 2025
C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion Lijie Hu Junchi Liao Weimin Lyu Shaopeng Fu Tianhao Huang Shu Yang Guimin Hu Di Wang AAML 126 0 0 12 Mar 2025
Discovering Influential Neuron Path in Vision Transformers Yifan Wang Yifei Liu Yingdong Shi Chong Li Anqi Pang Sibei Yang Jingyi Yu Kan Ren ViT 255 0 0 12 Mar 2025
QPM: Discrete Optimization for Globally Interpretable Image Classification Thomas Norrenbrock Timo Kaiser Sovan Biswas R. Manuvinakurike Bodo Rosenhahn 158 0 0 27 Feb 2025
Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts Chaitanya Kapoor Sudhanshu Srivastava Meenakshi Khosla 118 0 0 26 Feb 2025
Model Lakes Koyena Pal David Bau Renée J. Miller 182 2 0 24 Feb 2025
LaVCa: LLM-assisted Visual Cortex Captioning Takuya Matsuyama Shinji Nishimoto Yu Takagi 135 1 0 20 Feb 2025
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models Thomas Fel Ekdeep Singh Lubana Jacob S. Prince M. Kowal Victor Boutin Isabel Papadimitriou Binxu Wang Martin Wattenberg Demba Ba Talia Konkle 76 8 0 18 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection Cristian Gutierrez LRM 269 0 0 17 Feb 2025
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships Angie Boggust Hyemin Bang Hendrik Strobelt Arvindmani Satyanarayan 104 0 0 17 Feb 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models Samuel Stevens Wei-Lun Chao T. Berger-Wolf Yu-Chuan Su VLM 150 6 0 10 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment Harrish Thasarathan Julian Forsyth Thomas Fel M. Kowal Konstantinos G. Derpanis 146 10 0 06 Feb 2025
Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning Zeyu Jiang Hai Huang Xingquan Zuo OffRL 96 0 0 02 Feb 2025
Dimensions underlying the representational alignment of deep neural networks with humans F. Mahner Lukas Muttenthaler Umut Güçlü M. Hebart 206 7 0 28 Jan 2025
Faithful Counterfactual Visual Explanations (FCVE) Bismillah Khan Syed Ali Tariq Tehseen Zia Muhammad Ahsan David Windridge 89 1 0 12 Jan 2025
Interpreting Deep Neural Network-Based Receiver Under Varying Signal-To-Noise Ratios Marko Tuononen Dani Korpi Ville Hautamäki FAtt 104 2 0 10 Jan 2025
Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts Jihye Choi Jayaram Raghuram Yixuan Li Somesh Jha 155 5 0 18 Dec 2024
Concept Learning in the Wild: Towards Algorithmic Understanding of Neural Networks Elad Shohama Hadar Cohena Khalil Wattada Havana Rikab Dan Vilenchik 118 1 0 15 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey Yunkai Dang Kaichen Huang Jiahao Huo Yibo Yan Shijie Huang ... Kun Wang Yong Liu Jing Shao Hui Xiong Xuming Hu LRM 170 22 0 03 Dec 2024
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers Éloi Zablocki Valentin Gerard Amaia Cardiel Eric Gaussier Matthieu Cord Eduardo Valle 164 0 0 23 Nov 2024
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations Laura O'Mahony Nikola S. Nikolov David JP O'Sullivan 153 0 0 15 Nov 2024
Local vs distributed representations: What is the right basis for interpretability? Julien Colin L. Goetschalckx Thomas Fel Victor Boutin Jay Gopal Thomas Serre Nuria Oliver HAI 89 2 0 06 Nov 2024
FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation Ziwei Zhan Wenkuan Zhao Yuanqing Li Weijie Liu Xiaoxi Zhang Chee Wei Tan Chuan Wu Deke Guo Xu Chen MoE 104 2 0 04 Nov 2024
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders Luke Marks Alasdair Paren David M. Krueger Fazl Barez AAML 62 7 0 02 Nov 2024