ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.12918
  4. Cited By
N2G: A Scalable Approach for Quantifying Interpretable Neuron
  Representations in Large Language Models

N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models

22 April 2023
Alex Foote
Neel Nanda
Esben Kran
Ionnis Konstas
Fazl Barez
    MILM
ArXiv (abs)PDFHTML

Papers citing "N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models"

14 / 14 papers shown
Title
System III: Learning with Domain Knowledge for Safety Constraints
System III: Learning with Domain Knowledge for Safety Constraints
Fazl Barez
Hosien Hasanbieg
Alesandro Abbate
61
4
0
23 Apr 2023
Toy Models of Superposition
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAMLMILM
193
378
0
21 Sep 2022
Unsolved Problems in ML Safety
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
244
293
0
28 Sep 2021
Knowledge Neurons in Pretrained Transformers
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELMMU
97
463
0
18 Apr 2021
An Interpretability Illusion for BERT
An Interpretability Illusion for BERT
Tolga Bolukbasi
Adam Pearce
Ann Yuan
Andy Coenen
Emily Reif
Fernanda Viégas
Martin Wattenberg
MILMFAtt
77
80
0
14 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
475
2,120
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
170
843
0
29 Dec 2020
Intrinsic Probing through Dimension Selection
Intrinsic Probing through Dimension Selection
Lucas Torroba Hennigen
Adina Williams
Ryan Cotterell
56
58
0
06 Oct 2020
Analyzing Individual Neurons in Pre-trained Language Models
Analyzing Individual Neurons in Pre-trained Language Models
Nadir Durrani
Hassan Sajjad
Fahim Dalvi
Yonatan Belinkov
MILM
60
104
0
06 Oct 2020
Compositional Explanations of Neurons
Compositional Explanations of Neurons
Jesse Mu
Jacob Andreas
FAttCoGeMILM
69
178
0
24 Jun 2020
Similarity Analysis of Contextual Word Representation Models
Similarity Analysis of Contextual Word Representation Models
John M. Wu
Yonatan Belinkov
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
James R. Glass
95
75
0
03 May 2020
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in
  Deep NLP Models
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
Fahim Dalvi
Nadir Durrani
Hassan Sajjad
Yonatan Belinkov
A. Bau
James R. Glass
MILM
64
192
0
21 Dec 2018
Real Time Image Saliency for Black Box Classifiers
Real Time Image Saliency for Black Box Classifiers
P. Dabkowski
Y. Gal
70
592
0
22 May 2017
Concrete Problems in AI Safety
Concrete Problems in AI Safety
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
244
2,404
0
21 Jun 2016
1