Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.12918
Cited By
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models
22 April 2023
Alex Foote
Neel Nanda
Esben Kran
Ionnis Konstas
Fazl Barez
MILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models"
14 / 14 papers shown
Title
System III: Learning with Domain Knowledge for Safety Constraints
Fazl Barez
Hosien Hasanbieg
Alesandro Abbate
61
4
0
23 Apr 2023
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
193
378
0
21 Sep 2022
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
244
293
0
28 Sep 2021
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
97
463
0
18 Apr 2021
An Interpretability Illusion for BERT
Tolga Bolukbasi
Adam Pearce
Ann Yuan
Andy Coenen
Emily Reif
Fernanda Viégas
Martin Wattenberg
MILM
FAtt
77
80
0
14 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
475
2,120
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
170
843
0
29 Dec 2020
Intrinsic Probing through Dimension Selection
Lucas Torroba Hennigen
Adina Williams
Ryan Cotterell
56
58
0
06 Oct 2020
Analyzing Individual Neurons in Pre-trained Language Models
Nadir Durrani
Hassan Sajjad
Fahim Dalvi
Yonatan Belinkov
MILM
60
104
0
06 Oct 2020
Compositional Explanations of Neurons
Jesse Mu
Jacob Andreas
FAtt
CoGe
MILM
69
178
0
24 Jun 2020
Similarity Analysis of Contextual Word Representation Models
John M. Wu
Yonatan Belinkov
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
James R. Glass
95
75
0
03 May 2020
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
Fahim Dalvi
Nadir Durrani
Hassan Sajjad
Yonatan Belinkov
A. Bau
James R. Glass
MILM
64
192
0
21 Dec 2018
Real Time Image Saliency for Black Box Classifiers
P. Dabkowski
Y. Gal
70
592
0
22 May 2017
Concrete Problems in AI Safety
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
244
2,404
0
21 Jun 2016
1