Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.08842
Cited By
v1
v2 (latest)
Towards Combinatorial Interpretability of Neural Computation
10 April 2025
Micah Adler
Dan Alistarh
Nir Shavit
FAtt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Combinatorial Interpretability of Neural Computation"
20 / 20 papers shown
Title
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa
Thomas Fel
Ekdeep Singh Lubana
Bahareh Tolooshams
Demba Ba
69
0
0
03 Jun 2025
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Lovis Heindrich
Philip Torr
Fazl Barez
Veronika Thost
142
2
0
27 Feb 2025
On the Complexity of Neural Computation in Superposition
Micah Adler
Nir Shavit
231
4
0
05 Sep 2024
Mathematical Models of Computation in Superposition
Kaarel Hänni
Jake Mendel
Dmitry Vaintrob
Lawrence Chan
SupR
92
11
0
10 Aug 2024
Validating Mechanistic Interpretations: An Axiomatic Approach
Nils Palumbo
Ravi Mangal
Zifan Wang
Saranya Vijayakumar
Corina S. Pasareanu
Somesh Jha
106
1
0
18 Jul 2024
Transcoders Find Interpretable LLM Feature Circuits
Jacob Dunefsky
Philippe Chlenski
Neel Nanda
85
34
0
17 Jun 2024
Automatically Identifying Local and Global Circuits with Linear Computation Graphs
Xuyang Ge
Fukang Zhu
Wentao Shu
Junxuan Wang
Zhengfu He
Xipeng Qiu
96
10
0
22 May 2024
Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan
Arthur Conmy
Lewis Smith
Tom Lieberum
Vikrant Varma
János Kramár
Rohin Shah
Neel Nanda
RALM
82
94
0
24 Apr 2024
What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes
Victor Lecomte
Kushal Thaman
Rylan Schaeffer
Naomi Bashkansky
Trevor Chow
Sanmi Koyejo
AAML
MILM
71
12
0
05 Dec 2023
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham
Aidan Ewart
Logan Riggs
R. Huben
Lee Sharkey
MILM
143
449
0
15 Sep 2023
Progress measures for grokking via mechanistic interpretability
Neel Nanda
Lawrence Chan
Tom Lieberum
Jess Smith
Jacob Steinhardt
115
451
0
12 Jan 2023
Polysemanticity and Capacity in Neural Networks
Adam Scherlis
Kshitij Sachan
Adam Jermyn
Joe Benton
Buck Shlegeris
MILM
240
32
0
04 Oct 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
242
380
0
21 Sep 2022
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
673
4,945
0
23 Jan 2020
A Constructive Prediction of the Generalization Error Across Scales
Jonathan S. Rosenfeld
Amir Rosenfeld
Yonatan Belinkov
Nir Shavit
113
215
0
27 Sep 2019
Similarity of Neural Network Representations Revisited
Simon Kornblith
Mohammad Norouzi
Honglak Lee
Geoffrey E. Hinton
188
1,441
0
01 May 2019
Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks
Anh Totti Nguyen
J. Yosinski
Jeff Clune
120
330
0
11 Feb 2016
Deep Learning and the Information Bottleneck Principle
Naftali Tishby
Noga Zaslavsky
DRL
270
1,600
0
09 Mar 2015
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan
Andrea Vedaldi
Andrew Zisserman
FAtt
343
7,340
0
20 Dec 2013
Representation Learning: A Review and New Perspectives
Yoshua Bengio
Aaron Courville
Pascal Vincent
OOD
SSL
336
12,496
0
24 Jun 2012
1