Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22255
Cited By
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
28 May 2025
Vadim Kurochkin
Yaroslav Aksenov
Daniil Laptev
Daniil Gavrilov
Nikita Balagansky
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Train Sparse Autoencoders Efficiently by Utilizing Features Correlation"
13 / 13 papers shown
Title
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
Adam Karvonen
Can Rager
Johnny Lin
Curt Tigges
Joseph Isaac Bloom
...
Matthew Wearden
Arthur Conmy
Arthur Conmy
Samuel Marks
Neel Nanda
MU
140
22
0
12 Mar 2025
Sparse Autoencoders Can Interpret Randomly Initialized Transformers
Thomas Heap
Tim Lawson
Lucy Farnik
Laurence Aitchison
57
16
0
29 Jan 2025
Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide
Joshua Engels
Eric J. Michaud
Max Tegmark
Christian Schroeder de Witt
59
13
0
10 Oct 2024
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
David Chanin
James Wilken-Smith
Tomáš Dulka
Hardik Bhatnagar
Joseph Bloom
Joseph Isaac Bloom
80
35
0
22 Sep 2024
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Senthooran Rajamanoharan
Tom Lieberum
Nicolas Sonnerat
Arthur Conmy
Vikrant Varma
János Kramár
Neel Nanda
69
101
0
19 Jul 2024
Transcoders Find Interpretable LLM Feature Circuits
Jacob Dunefsky
Philippe Chlenski
Neel Nanda
60
34
0
17 Jun 2024
Scaling and evaluating sparse autoencoders
Leo Gao
Tom Dupré la Tour
Henk Tillman
Gabriel Goh
Rajan Troll
Alec Radford
Ilya Sutskever
Jan Leike
Jeffrey Wu
74
145
0
06 Jun 2024
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham
Aidan Ewart
Logan Riggs
R. Huben
Lee Sharkey
MILM
95
421
0
15 Sep 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
99
1,273
0
03 Apr 2023
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
174
365
0
21 Sep 2022
Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators
S. Lowe
Robert C. Earle
Jason dÉon
Thomas Trappenberg
Sageev Oore
43
2
0
22 Oct 2021
Kronecker Decomposition for GPT Compression
Ali Edalati
Marzieh S. Tahaei
Ahmad Rashid
V. Nia
J. Clark
Mehdi Rezagholizadeh
61
35
0
15 Oct 2021
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
246
2,643
0
23 Jan 2017
1