Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

16 February 2024

Usha Bhalla

Alexander X. Oesterling

Suraj Srinivas

Flavio du Pin Calmon

Himabindu Lakkaraju

ArXiv PDF HTML

Papers citing "Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)"

28 / 28 papers shown

Title
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning Siyi Chen Yimeng Zhang Sijia Liu Q. Qu AAML 144 0 0 30 Apr 2025
Transformation of audio embeddings into interpretable, concept-based representations Alice Zhang Edison Thomaz Lie Lu 29 0 0 18 Apr 2025
Interpreting the Linear Structure of Vision-language Model Embedding Spaces Isabel Papadimitriou Huangyuan Su Thomas Fel Naomi Saphra Sham Kakade Stephanie Gil VLM 52 0 0 16 Apr 2025
An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings Matías Piqueras Alexandra Segerberg Matteo Magnani Måns Magnusson Nataša Sladoje 43 0 0 14 Apr 2025
Steering CLIP's vision transformer with sparse autoencoders Sonia Joseph Praneet Suresh Ethan Goldfarb Lorenz Hufe Yossi Gandelsman Robert Graham Danilo Bzdok Wojciech Samek Blake A. Richards 51 2 0 11 Apr 2025
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Mateusz Pach Shyamgopal Karthik Quentin Bouniot Serge Belongie Zeynep Akata VLM 66 0 0 03 Apr 2025
Zero-Shot Visual Concept Blending Without Text Guidance Hiroya Makino Takahiro Yamaguchi Hiroyuki Sakai DiffM 43 0 0 27 Mar 2025
ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models Fernando Julio Cendra Kai Han VLM 58 0 0 25 Mar 2025
An Iterative Feedback Mechanism for Improving Natural Language Class Descriptions in Open-Vocabulary Object Detection Louis Y. Kim Michelle Karker Victoria Valledor Seiyoung C. Lee Karl F. Brzoska Margaret Duff Anthony Palladino VLM ObjD 56 0 0 21 Mar 2025
An interpretable approach to automating the assessment of biofouling in video footage Evelyn J. Mannix Bartholomew A. Woodham 63 0 0 17 Mar 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders Vladimir Zaigrajew Hubert Baniecki P. Biecek 51 0 0 27 Feb 2025
Model-agnostic Coreset Selection via LLM-based Concept Bottlenecks Akshay Mehra Trisha Mittal Subhadra Gopalakrishnan Joshua Kimball 45 0 0 23 Feb 2025
Multi-Faceted Multimodal Monosemanticity Hanqi Yan Xiangxiang Cui Lu Yin Paul Pu Liang Yulan He Yifei Wang 44 0 0 16 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment Harrish Thasarathan Julian Forsyth Thomas Fel M. Kowal Konstantinos G. Derpanis 111 7 0 06 Feb 2025
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey Yunkai Dang Kaichen Huang Jiahao Huo Yibo Yan S. Huang ... Kun Wang Yong Liu Jing Shao Hui Xiong Xuming Hu LRM 101 15 0 03 Dec 2024
Towards Unifying Interpretability and Control: Evaluation via Intervention Usha Bhalla Suraj Srinivas Asma Ghandeharioun Himabindu Lakkaraju 40 5 0 07 Nov 2024
ResiDual Transformer Alignment with Spectral Decomposition Lorenzo Basile Valentino Maiorca Luca Bortolussi Emanuele Rodolà Francesco Locatello 48 1 0 31 Oct 2024
WASP: A Weight-Space Approach to Detecting Learned Spuriousness Cristian Daniel Păduraru Antonio Bărbălău Radu Filipescu Andrei Liviu Nicolicioiu Elena Burceanu 25 0 0 24 Oct 2024
Unified Lexical Representation for Interpretable Visual-Language Alignment Yifan Li Yikai Wang Yanwei Fu Dongyu Ru Zheng-Wei Zhang Tong He VLM 42 4 0 25 Jul 2024
Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers Alex Oesterling Usha Bhalla Suresh Venkatasubramanian Himabindu Lakkaraju 46 1 0 11 Jul 2024
DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor Juncheng Wu Zhangkai Ni Hanli Wang Wenhan Yang Yuyin Zhou Shiqi Wang 40 1 0 12 Jun 2024
A Concept-Based Explainability Framework for Large Multimodal Models Jayneel Parekh Pegah Khayatan Mustafa Shukor A. Newson Matthieu Cord 40 16 0 12 Jun 2024
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment Zijia Song Z. Zang Yelin Wang Guozheng Yang Jiangbin Zheng Kaicheng Yu Wanyu Chen Stan Z. Li 33 0 0 09 Jun 2024
I Bet You Did Not Mean That: Testing Semantic Importance via Betting Jacopo Teneggi Jeremias Sulam FAtt 33 1 0 29 May 2024
Crafting Interpretable Embeddings by Asking LLMs Questions Vinamra Benara Chandan Singh John X. Morris Richard Antonello Ion Stoica Alexander G. Huth Jianfeng Gao 26 5 0 26 May 2024
Linearly Mapping from Image to Text Space Jack Merullo Louis Castricato Carsten Eickhoff Ellie Pavlick VLM 164 104 0 30 Sep 2022
Post-hoc Concept Bottleneck Models Mert Yuksekgonul Maggie Wang James Zou 145 185 0 31 May 2022
On Interpretability of Deep Learning based Skin Lesion Classifiers using Concept Activation Vectors Adriano Lucieri Muhammad Naseer Bajwa S. Braun M. I. Malik Andreas Dengel Sheraz Ahmed MedIm 163 64 0 05 May 2020