Analyzing Individual Neurons in Pre-trained Language Models

6 October 2020

Papers citing "Analyzing Individual Neurons in Pre-trained Language Models"

35 / 35 papers shown

Title
Towards Understanding How Knowledge Evolves in Large Vision-Language Models Sudong Wang Yuyao Zhang Yao Zhu Jianing Li Zizhe Wang Yi Liu Xiangyang Ji 158 0 0 31 Mar 2025
Discovering Influential Neuron Path in Vision Transformers Yifan Wang Yifei Liu Yingdong Shi Chong Li Anqi Pang Sibei Yang Jingyi Yu Kan Ren ViT 69 0 0 12 Mar 2025
From Tokens to Words: On the Inner Lexicon of LLMs Guy Kaplan Matanel Oren Yuval Reif Roy Schwartz 50 12 0 08 Oct 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 40 10 0 27 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation Michal Golovanevsky William Rudman Vedant Palit Ritambhara Singh Carsten Eickhoff 33 1 0 24 Jun 2024
What does the Knowledge Neuron Thesis Have to do with Knowledge? Jingcheng Niu Andrew Liu Zining Zhu Gerald Penn 48 31 0 03 May 2024
Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge Xin Zhao Naoki Yoshinaga Daisuke Oba KELM HILM 36 10 0 08 Mar 2024
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models Albert Garde Esben Kran Fazl Barez 11 2 0 03 Oct 2023
Redundancy and Concept Analysis for Code-trained Language Models Arushi Sharma Zefu Hu Christopher Quinn Ali Jannesari 73 1 0 01 May 2023
Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space Filip Klubicka Vasudevan Nedumpozhimana John D. Kelleher 41 4 0 27 Apr 2023
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models Alex Foote Neel Nanda Esben Kran Ionnis Konstas Fazl Barez MILM 28 2 0 22 Apr 2023
Interpretability in Activation Space Analysis of Transformers: A Focused Survey Soniya Vijayakumar AI4CE 35 3 0 22 Jan 2023
Dissociating language and thought in large language models Kyle Mahowald Anna A. Ivanova I. Blank Nancy Kanwisher J. Tenenbaum Evelina Fedorenko ELM ReLM 29 209 0 16 Jan 2023
Interpreting Neural Networks through the Polytope Lens Sid Black Lee D. Sharkey Léo Grinsztajn Eric Winsor Daniel A. Braun ... Kip Parker Carlos Ramón Guevara Beren Millidge Gabriel Alfour Connor Leahy FAtt MILM 31 22 0 22 Nov 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models Xiaozhi Wang Kaiyue Wen Zhengyan Zhang Lei Hou Zhiyuan Liu Juanzi Li MILM MoE 27 50 0 14 Nov 2022
ConceptX: A Framework for Latent Concept Analysis Firoj Alam Fahim Dalvi Nadir Durrani Hassan Sajjad A. Khan Jia Xu 33 5 0 12 Nov 2022
Impact of Adversarial Training on Robustness and Generalizability of Language Models Enes Altinisik Hassan Sajjad Husrev Taha Sencar Safa Messaoud Sanjay Chawla AAML 24 8 0 10 Nov 2022
On the Transformation of Latent Space in Fine-Tuned NLP Models Nadir Durrani Hassan Sajjad Fahim Dalvi Firoj Alam 32 18 0 23 Oct 2022
Probing with Noise: Unpicking the Warp and Weft of Embeddings Filip Klubicka John D. Kelleher 30 4 0 21 Oct 2022
Analyzing Transformers in Embedding Space Guy Dar Mor Geva Ankit Gupta Jonathan Berant 24 83 0 06 Sep 2022
Discovering Latent Concepts Learned in BERT Fahim Dalvi A. Khan Firoj Alam Nadir Durrani Jia Xu Hassan Sajjad SSL 11 56 0 15 May 2022
On the Pitfalls of Analyzing Individual Neurons in Language Models Omer Antverg Yonatan Belinkov MILM 27 49 0 14 Oct 2021
Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color Mostafa Abdou Artur Kulmizev Daniel Hershcovich Stella Frank Ellie Pavlick Anders Søgaard 22 114 0 13 Sep 2021
Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids' Representations Mohsen Fayyaz Ehsan Aghazadeh Ali Modarressi Hosein Mohebbi Mohammad Taher Pilehvar 18 21 0 13 Sep 2021
A Bayesian Framework for Information-Theoretic Probing Tiago Pimentel Ryan Cotterell 28 24 0 08 Sep 2021
Neuron-level Interpretation of Deep NLP Models: A Survey Hassan Sajjad Nadir Durrani Fahim Dalvi MILM AI4CE 35 80 0 30 Aug 2021
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis Shammur A. Chowdhury Nadir Durrani Ahmed M. Ali 41 12 0 01 Jul 2021
How transfer learning impacts linguistic knowledge in deep NLP models? Nadir Durrani Hassan Sajjad Fahim Dalvi 13 49 0 31 May 2021
An Interpretability Illusion for BERT Tolga Bolukbasi Adam Pearce Ann Yuan Andy Coenen Emily Reif Fernanda Viégas Martin Wattenberg MILM FAtt 40 68 0 14 Apr 2021
Transformer Feed-Forward Layers Are Key-Value Memories Mor Geva R. Schuster Jonathan Berant Omer Levy KELM 39 745 0 29 Dec 2020
Positional Artefacts Propagate Through Masked Language Model Embeddings Ziyang Luo Artur Kulmizev Xiaoxi Mao 29 41 0 09 Nov 2020
Similarity Analysis of Contextual Word Representation Models John M. Wu Yonatan Belinkov Hassan Sajjad Nadir Durrani Fahim Dalvi James R. Glass 51 73 0 03 May 2020
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 201 882 0 03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 299 6,984 0 20 Apr 2018