Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders

Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders

27 May 2025

Grigorios G. Chrysos

Papers citing "Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders"

13 / 13 papers shown

Title
Hadamard product in deep learning: Introduction, Advances and Challenges Grigorios G. Chrysos Yongtao Wu Razvan Pascanu Philip Torr Volkan Cevher AAML 142 2 0 17 Apr 2025
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability Adam Karvonen Can Rager Johnny Lin Curt Tigges Joseph Isaac Bloom ... Matthew Wearden Arthur Conmy Arthur Conmy Samuel Marks Neel Nanda MU 125 19 0 12 Mar 2025
Closed-Form Feedback-Free Learning with Forward Projection Robert O'Shea Bipin Rajendran 45 0 0 27 Jan 2025
Decomposing The Dark Matter of Sparse Autoencoders Joshua Engels Logan Riggs Max Tegmark LLMSV 70 12 0 18 Oct 2024
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders David Chanin James Wilken-Smith Tomáš Dulka Hardik Bhatnagar Joseph Bloom Joseph Isaac Bloom 63 31 0 22 Sep 2024
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization James Oldfield Markos Georgopoulos Grigorios G. Chrysos Christos Tzelepis Yannis Panagakis M. Nicolaou Jiankang Deng Ioannis Patras MoE 60 9 0 19 Feb 2024
Discovering Latent Knowledge in Language Models Without Supervision Collin Burns Haotian Ye Dan Klein Jacob Steinhardt 99 350 0 07 Dec 2022
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts Basil Mustafa C. Riquelme J. Puigcerver Rodolphe Jenatton N. Houlsby VLM MoE 108 190 0 06 Jun 2022
PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs James Oldfield Christos Tzelepis Yannis Panagakis M. Nicolaou Ioannis Patras GAN 62 24 0 31 May 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models Barret Zoph Irwan Bello Sameer Kumar Nan Du Yanping Huang J. Dean Noam M. Shazeer W. Fedus MoE 135 191 0 17 Feb 2022
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts Nan Du Yanping Huang Andrew M. Dai Simon Tong Dmitry Lepikhin ... Kun Zhang Quoc V. Le Yonghui Wu Zhiwen Chen Claire Cui ALM MoE 151 794 0 13 Dec 2021
Dynamic Neural Networks: A Survey Yizeng Han Gao Huang Shiji Song Le Yang Honghui Wang Yulin Wang 3DH AI4TS AI4CE 54 638 0 09 Feb 2021
The Mythos of Model Interpretability Zachary Chase Lipton FaML 121 3,672 0 10 Jun 2016