Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

18 February 2025

Ekdeep Singh Lubana

Jacob S. Prince

Isabel Papadimitriou

Martin Wattenberg

Papers citing "Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models"

19 / 19 papers shown

Title
Ensembling Sparse Autoencoders Soham Gadgil Chris Lin Su-In Lee 21 0 0 21 May 2025
Interpreting the linear structure of vision-language model embedding spaces Isabel Papadimitriou Huangyuan Su Thomas Fel Naomi Saphra Sham Kakade VLM 72 0 0 16 Apr 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry Sai Sumedh R. Hindupur Ekdeep Singh Lubana Thomas Fel Demba Ba 62 6 0 03 Mar 2025
One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models Viacheslav Surkov Chris Wendler Mikhail Terekhov Justin Deschenaux Robert West Robert West Çağlar Gülçehre David Bau VLM 48 14 0 28 Oct 2024
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Dan Braun Jordan K. Taylor Nicholas Goldowsky-Dill Lee D. Sharkey 35 38 0 17 May 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) Usha Bhalla Alexander X. Oesterling Suraj Srinivas Flavio du Pin Calmon Himabindu Lakkaraju 58 38 0 16 Feb 2024
A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation Thomas Fel Victor Boutin Mazda Moayeri Rémi Cadène Louis Bethune Léo Andéol Mathieu Chalvidal Thomas Serre FAtt 29 57 0 11 Jun 2023
Sigmoid Loss for Language Image Pre-Training Xiaohua Zhai Basil Mustafa Alexander Kolesnikov Lucas Beyer CLIP VLM 50 1,028 0 27 Mar 2023
Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure Paul Novello Thomas Fel David Vigouroux FAtt 33 28 0 13 Jun 2022
Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond Anna Hedström Leander Weber Dilyara Bareeva Daniel G. Krakowczyk Franz Motzkus Wojciech Samek Sebastian Lapuschkin Marina M.-C. Höhne XAI ELM 26 173 0 14 Feb 2022
Visual Representation Learning Does Not Generalize Strongly Within the Same Domain Lukas Schott Julius von Kügelgen Frederik Trauble Peter V. Gehler Chris Russell Matthias Bethge Bernhard Schölkopf Francesco Locatello Wieland Brendel OOD DRL 47 69 0 17 Jul 2021
K-Deep Simplex: Deep Manifold Learning via Local Dictionaries Pranay Tankala Abiy Tasissa James M. Murphy Demba E. Ba 29 11 0 03 Dec 2020
RISE: Randomized Input Sampling for Explanation of Black-box Models Vitali Petsiuk Abir Das Kate Saenko FAtt 89 1,159 0 19 Jun 2018
On Identifiability of Nonnegative Matrix Factorization Xiao Fu Kejun Huang N. Sidiropoulos 50 94 0 02 Sep 2017
Interpretable Explanations of Black Boxes by Meaningful Perturbation Ruth C. Fong Andrea Vedaldi FAtt AAML 18 1,514 0 11 Apr 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 52 5,920 0 04 Mar 2017
Visualizing and Understanding Convolutional Networks Matthew D. Zeiler Rob Fergus FAtt SSL 36 15,825 0 12 Nov 2013
Learning Topic Models - Going beyond SVD Sanjeev Arora Rong Ge Ankur Moitra 39 432 0 09 Apr 2012
Structured sparsity through convex optimization Francis R. Bach Rodolphe Jenatton Julien Mairal G. Obozinski 110 324 0 12 Sep 2011