N2G: A Scalable Approach for Quantifying Interpretable Neuron
Representations in Large Language Models

N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models

22 April 2023

ArXiv (abs)PDF HTML

Papers citing "N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models"

14 / 14 papers shown

Title
System III: Learning with Domain Knowledge for Safety Constraints Fazl Barez Hosien Hasanbieg Alesandro Abbate 61 4 0 23 Apr 2023
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 193 378 0 21 Sep 2022
Unsolved Problems in ML Safety Dan Hendrycks Nicholas Carlini John Schulman Jacob Steinhardt 244 293 0 28 Sep 2021
Knowledge Neurons in Pretrained Transformers Damai Dai Li Dong Y. Hao Zhifang Sui Baobao Chang Furu Wei KELM MU 97 463 0 18 Apr 2021
An Interpretability Illusion for BERT Tolga Bolukbasi Adam Pearce Ann Yuan Andy Coenen Emily Reif Fernanda Viégas Martin Wattenberg MILM FAtt 77 80 0 14 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 475 2,120 0 31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories Mor Geva R. Schuster Jonathan Berant Omer Levy KELM 170 843 0 29 Dec 2020
Intrinsic Probing through Dimension Selection Lucas Torroba Hennigen Adina Williams Ryan Cotterell 56 58 0 06 Oct 2020
Analyzing Individual Neurons in Pre-trained Language Models Nadir Durrani Hassan Sajjad Fahim Dalvi Yonatan Belinkov MILM 60 104 0 06 Oct 2020
Compositional Explanations of Neurons Jesse Mu Jacob Andreas FAtt CoGe MILM 69 178 0 24 Jun 2020
Similarity Analysis of Contextual Word Representation Models John M. Wu Yonatan Belinkov Hassan Sajjad Nadir Durrani Fahim Dalvi James R. Glass 95 75 0 03 May 2020
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models Fahim Dalvi Nadir Durrani Hassan Sajjad Yonatan Belinkov A. Bau James R. Glass MILM 64 192 0 21 Dec 2018
Real Time Image Saliency for Black Box Classifiers P. Dabkowski Y. Gal 70 592 0 22 May 2017
Concrete Problems in AI Safety Dario Amodei C. Olah Jacob Steinhardt Paul Christiano John Schulman Dandelion Mané 244 2,404 0 21 Jun 2016