Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.01220
Cited By
v1
v2 (latest)
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
2 November 2024
Luke Marks
Alasdair Paren
David M. Krueger
Fazl Barez
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders"
11 / 11 papers shown
Title
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
174
33
0
02 Jul 2024
Rigorously Assessing Natural Language Explanations of Neurons
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
75
29
0
19 Sep 2023
Interpreting Neural Networks through the Polytope Lens
Sid Black
Lee D. Sharkey
Léo Grinsztajn
Eric Winsor
Daniel A. Braun
...
Kip Parker
Carlos Ramón Guevara
Beren Millidge
Gabriel Alfour
Connor Leahy
FAtt
MILM
72
26
0
22 Nov 2022
Engineering Monosemanticity in Toy Models
Adam Jermyn
Nicholas Schiefer
Evan Hubinger
MILM
52
10
0
16 Nov 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
198
380
0
21 Sep 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
476
2,123
0
31 Dec 2020
Deep Co-Training for Semi-Supervised Image Recognition
Siyuan Qiao
Wei Shen
Zhishuai Zhang
Bo Wang
Alan Yuille
64
451
0
15 Mar 2018
Fraternal Dropout
Konrad Zolna
Devansh Arpit
Dendi Suhubdy
Yoshua Bengio
52
53
0
31 Oct 2017
Deep Mutual Learning
Ying Zhang
Tao Xiang
Timothy M. Hospedales
Huchuan Lu
FedML
155
1,656
0
01 Jun 2017
Network Dissection: Quantifying Interpretability of Deep Visual Representations
David Bau
Bolei Zhou
A. Khosla
A. Oliva
Antonio Torralba
MILM
FAtt
158
1,526
1
19 Apr 2017
Temporal Ensembling for Semi-Supervised Learning
S. Laine
Timo Aila
UQCV
192
2,570
0
07 Oct 2016
1