Ensembling Sparse Autoencoders

21 May 2025

Papers citing "Ensembling Sparse Autoencoders"

9 / 9 papers shown

Title
Learning Multi-Level Features with Matryoshka Sparse Autoencoders Bart Bussmann Noa Nabeshima Adam Karvonen Neel Nanda 80 7 0 21 Mar 2025
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability Adam Karvonen Can Rager Johnny Lin Curt Tigges Joseph Isaac Bloom ... Matthew Wearden Arthur Conmy Samuel Marks Samuel Marks Neel Nanda MU 117 18 0 12 Mar 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry Sai Sumedh R. Hindupur Ekdeep Singh Lubana Thomas Fel Demba Ba 64 6 0 03 Mar 2025
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models Thomas Fel Ekdeep Singh Lubana Jacob S. Prince M. Kowal Victor Boutin Isabel Papadimitriou Binxu Wang Martin Wattenberg Demba Ba Talia Konkle 35 3 0 18 Feb 2025
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders David Chanin James Wilken-Smith Tomáš Dulka Hardik Bhatnagar Joseph Bloom Joseph Isaac Bloom 60 28 0 22 Sep 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Samuel Marks Can Rager Eric J. Michaud Yonatan Belinkov David Bau Aaron Mueller 64 137 0 28 Mar 2024
Gender bias and stereotypes in Large Language Models Hadas Kotek Rikker Dockum David Q. Sun 57 220 0 28 Aug 2023
A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation Thomas Fel Victor Boutin Mazda Moayeri Rémi Cadène Louis Bethune Léo Andéol Mathieu Chalvidal Thomas Serre FAtt 31 57 0 11 Jun 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 342 2,051 0 31 Dec 2020