Understanding the Role of Individual Units in a Deep Neural Network

10 September 2020

Jun-Yan Zhu

Antonio Torralba

Papers citing "Understanding the Role of Individual Units in a Deep Neural Network"

50 / 81 papers shown

Title
What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift Jiamin Chang Yiming Li Hammond Pearce Ruoxi Sun Bo-wen Li Minhui Xue 43 0 0 28 Apr 2025
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs Ling Hu Yuemei Xu Xiaoyang Gu Letao Han 33 0 0 07 Apr 2025
Effective Skill Unlearning through Intervention and Abstention Yongce Li Chung-En Sun Tsui-Wei Weng MU 223 0 0 27 Mar 2025
Representational Similarity via Interpretable Visual Concepts Neehar Kondapaneni Oisin Mac Aodha Pietro Perona DRL 240 0 0 19 Mar 2025
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation Jonathan Jacobi Gal Niv LRM ReLM 65 0 0 03 Mar 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection Cristian Gutierrez LRM 69 0 0 17 Feb 2025
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution Shichang Zhang Tessa Han Usha Bhalla Hima Lakkaraju FAtt 157 0 0 17 Feb 2025
Dimensions underlying the representational alignment of deep neural networks with humans F. Mahner Lukas Muttenthaler Umut Güçlü M. Hebart 48 4 0 28 Jan 2025
Faithful Counterfactual Visual Explanations (FCVE) Bismillah Khan Syed Ali Tariq Tehseen Zia Muhammad Ahsan David Windridge 44 0 0 12 Jan 2025
Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers Syed Ali Tariq Tehseen Zia Mubeen Ghafoor AAML 62 7 0 12 Jan 2025
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations Nick Jiang Anish Kachinthaya Suzie Petryk Yossi Gandelsman VLM 36 17 0 03 Oct 2024
AND: Audio Network Dissection for Interpreting Deep Acoustic Models Tung-Yu Wu Yu-Xiang Lin Tsui-Wei Weng 54 1 0 24 Jun 2024
Beyond Individual Facts: Investigating Categorical Knowledge Locality of Taxonomy and Meronomy Concepts in GPT Models Christopher Burger Yifan Hu Thai Le KELM 49 0 0 22 Jun 2024
Interpreting the Second-Order Effects of Neurons in CLIP Yossi Gandelsman Alexei A. Efros Jacob Steinhardt MILM 62 16 0 06 Jun 2024
Pruning for Robust Concept Erasing in Diffusion Models Tianyun Yang Juan Cao Chang Xu 40 13 0 26 May 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories Tianlong Wang Xianfeng Jiao Yifan He Zhongzhi Chen Yinghao Zhu Xu Chu Junyi Gao Yasha Wang Liantao Ma LLMSV 71 8 0 26 May 2024
Error-margin Analysis for Hidden Neuron Activation Labels Abhilekha Dalal R. Rayan Pascal Hitzler FAtt 31 1 0 14 May 2024
Linear Explanations for Individual Neurons Tuomas P. Oikarinen Tsui-Wei Weng FAtt MILM 31 6 0 10 May 2024
A Multimodal Automated Interpretability Agent Tamar Rott Shaham Sarah Schwettmann Franklin Wang Achyuta Rajaram Evan Hernandez Jacob Andreas Antonio Torralba 39 18 0 22 Apr 2024
On the Value of Labeled Data and Symbolic Methods for Hidden Neuron Activation Analysis Abhilekha Dalal R. Rayan Adrita Barua Eugene Y. Vasserman Md Kamruzzaman Sarker Pascal Hitzler 30 4 0 21 Apr 2024
Faster Diffusion via Temporal Attention Decomposition Haozhe Liu Wentian Zhang Jinheng Xie Francesco Faccio Mengmeng Xu Tao Xiang Mike Zheng Shou Juan-Manuel Perez-Rua Jürgen Schmidhuber DiffM 77 19 0 03 Apr 2024
Language Models Represent Beliefs of Self and Others Wentao Zhu Zhining Zhang Yizhou Wang MILM LRM 52 8 0 28 Feb 2024
Understanding the Role of Pathways in a Deep Neural Network Lei Lyu Chen Pang Jihua Wang 35 3 0 28 Feb 2024
Deeper Understanding of Black-box Predictions via Generalized Influence Functions Hyeonsu Lyu Jonggyu Jang Sehyun Ryu H. Yang TDI AI4CE 27 5 0 09 Dec 2023
Conceptualizing the Relationship between AI Explanations and User Agency Iyadunni Adenuga Jonathan Dodge 29 2 0 05 Dec 2023
Codebook Features: Sparse and Discrete Interpretability for Neural Networks Alex Tamkin Mohammad Taufeeque Noah D. Goodman 35 27 0 26 Oct 2023
Unlearning with Fisher Masking Yufang Liu Changzhi Sun Yuanbin Wu Aimin Zhou MU 23 5 0 09 Oct 2023
Explaining black box text modules in natural language with language models Chandan Singh Aliyah R. Hsu Richard Antonello Shailee Jain Alexander G. Huth Bin-Xia Yu Jianfeng Gao MILM 36 47 0 17 May 2023
LINe: Out-of-Distribution Detection by Leveraging Important Neurons Yong Hyun Ahn Gyeong-Moon Park Seong Tae Kim OODD 119 31 0 24 Mar 2023
P+: Extended Textual Conditioning in Text-to-Image Generation A. Voynov Qinghao Chu Daniel Cohen-Or Kfir Aberman VLM DiffM 51 176 0 16 Mar 2023
Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment Alejandro Peña Ignacio Serna Aythami Morales Julian Fierrez Alfonso Ortega Ainhoa Herrarte Manuel Alcántara J. Ortega-Garcia FaML 25 35 0 13 Feb 2023
PAMI: partition input and aggregate outputs for model interpretation Wei Shi Wentao Zhang Weishi Zheng Ruixuan Wang FAtt 26 3 0 07 Feb 2023
Interpreting Robustness Proofs of Deep Neural Networks Debangshu Banerjee Avaljot Singh Gagandeep Singh AAML 29 5 0 31 Jan 2023
Open Problems in Applied Deep Learning M. Raissi AI4CE 44 2 0 26 Jan 2023
Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks Fenglei Fan Yingxin Li Hanchuan Peng T. Zeng Fei Wang 25 5 0 23 Jan 2023
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Peter Hase Joey Tianyi Zhou Been Kim Asma Ghandeharioun MILM 48 167 0 10 Jan 2023
Correspondence Distillation from NeRF-based GAN Yushi Lan Chen Change Loy Bo Dai 38 9 0 19 Dec 2022
Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators Haitian Zheng Zhe-nan Lin Jingwan Lu Scott D. Cohen Eli Shechtman ... Jianming Zhang Qing Liu Yuqian Zhou Sohrab Amirghodsi Jiebo Luo DiffM 28 1 0 13 Dec 2022
On the Complexity of Bayesian Generalization Yuge Shi Manjie Xu J. Hopcroft Kun He J. Tenenbaum Song-Chun Zhu Ying Nian Wu Wenjuan Han Yixin Zhu 30 4 0 20 Nov 2022
Data-Centric Debugging: mitigating model failures via targeted data collection Sahil Singla Atoosa Malemir Chegini Mazda Moayeri Soheil Feiz 27 4 0 17 Nov 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models Xiaozhi Wang Kaiyue Wen Zhengyan Zhang Lei Hou Zhiyuan Liu Juanzi Li MILM MoE 29 51 0 14 Nov 2022
Emergence of Concepts in DNNs? Tim Räz 21 0 0 11 Nov 2022
An Interactive Interpretability System for Breast Cancer Screening with Deep Learning Yuzhe Lu Adam Perer 26 3 0 30 Sep 2022
NeuCEPT: Locally Discover Neural Networks' Mechanism via Critical Neurons Identification with Precision Guarantee Minh Nhat Vu Truc D. T. Nguyen My T. Thai AAML 27 3 0 18 Sep 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks Tilman Raukur A. Ho Stephen Casper Dylan Hadfield-Menell AAML AI4CE 28 124 0 27 Jul 2022
Debiasing Deep Chest X-Ray Classifiers using Intra- and Post-processing Methods Ricards Marcinkevics Ece Ozkan Julia E. Vogt 30 18 0 26 Jul 2022
Activation Template Matching Loss for Explainable Face Recognition Huawei Lin Haozhe Liu Qiufu Li Linlin Shen CVBM 29 1 0 05 Jul 2022
Interpretability, Then What? Editing Machine Learning Models to Reflect Human Knowledge and Values Zijie J. Wang Alex Kale Harsha Nori P. Stella M. Nunnally Duen Horng Chau Mihaela Vorvoreanu J. W. Vaughan R. Caruana KELM 69 27 0 30 Jun 2022
From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation Reduan Achtibat Maximilian Dreyer Ilona Eisenbraun S. Bosse Thomas Wiegand Wojciech Samek Sebastian Lapuschkin FAtt 36 134 0 07 Jun 2022
DL4SciVis: A State-of-the-Art Survey on Deep Learning for Scientific Visualization Chaoli Wang J. Han 41 36 0 13 Apr 2022