Disentangling visual and written concepts in CLIP

15 June 2022

Antonio Torralba

Papers citing "Disentangling visual and written concepts in CLIP"

38 / 38 papers shown

Title
Transformation of audio embeddings into interpretable, concept-based representations Alice Zhang Edison Thomaz Lie Lu 29 0 0 18 Apr 2025
Steering CLIP's vision transformer with sparse autoencoders Sonia Joseph Praneet Suresh Ethan Goldfarb Lorenz Hufe Yossi Gandelsman Robert Graham Danilo Bzdok Wojciech Samek Blake A. Richards 56 2 0 11 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models Justus Westerhoff Erblina Purellku Jakob Hackstein Jonas Loos Leo Pinetzki Lorenz Hufe AAML 33 0 0 07 Apr 2025
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID Xin Liang Yogesh S Rawat 90 0 0 28 Mar 2025
Zero-Shot Visual Concept Blending Without Text Guidance Hiroya Makino Takahiro Yamaguchi Hiroyuki Sakai DiffM 50 0 0 27 Mar 2025
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing Achint Soni Meet Soni Sirisha Rambhatla DiffM 66 0 0 27 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models Maan Qraitem Piotr Teterwak Kate Saenko Bryan A. Plummer AAML 80 0 0 17 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models Tobia Poppi Tejaswi Kasarla Pascal Mettes Lorenzo Baraldi Rita Cucchiara VLM MU 66 0 0 15 Mar 2025
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook Meng Yang Tianqing Zhu Chi Liu Wanlei Zhou Shui Yu Philip S. Yu AAML ELM PILM 69 1 0 12 Nov 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training Sara Sarto Nicholas Moratelli Marcella Cornia Lorenzo Baraldi Rita Cucchiara 45 3 0 09 Oct 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling William Y. Zhu Keren Ye Junjie Ke Jiahui Yu Leonidas J. Guibas P. Milanfar Feng Yang 51 2 0 07 Aug 2024
DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning Dino Ienco C. Dantas 44 3 0 05 Aug 2024
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Gangyan Zeng Yuan Zhang Jin Wei Dongbao Yang Peng Zhang Yiwen Gao Xugong Qin Yu Zhou VLM CLIP 32 0 0 01 Aug 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Sachit Menon Richard Zemel Carl Vondrick LRM 45 2 0 20 Jun 2024
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models Simon Schrodi David T. Hoffmann Max Argus Volker Fischer Thomas Brox VLM 58 1 0 11 Apr 2024
ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale Jinbin Huang Chong Chen Aditi Mishra Bum Chul Kwon Zhicheng Liu Chris Bryan 50 4 0 03 Apr 2024
Scene Depth Estimation from Traditional Oriental Landscape Paintings Sungho Kang Yeonghyeon Park H. Park Juneho Yi 52 0 0 06 Mar 2024
Closed-Loop Unsupervised Representation Disentanglement with $β$ -VAE Distillation and Diffusion Probabilistic Feedback Xin Jin Bo Li Baao Xie Wenyao Zhang Jinming Liu Ziqiang Li Tao Yang Wenjun Zeng DRL DiffM CoGe 42 7 0 04 Feb 2024
Parrot Captions Teach CLIP to Spot Text Yiqi Lin Conghui He Alex Jinpeng Wang Bin Wang Weijia Li Mike Zheng Shou 38 7 0 21 Dec 2023
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models Samuele Poppi Tobia Poppi Federico Cocchi Marcella Cornia Lorenzo Baraldi Rita Cucchiara VLM 27 9 0 27 Nov 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models Morris Alper Hadar Averbuch-Elor 48 10 0 25 Oct 2023
Interpreting CLIP's Image Representation via Text-Based Decomposition Yossi Gandelsman Alexei A. Efros Jacob Steinhardt VLM 21 83 0 09 Oct 2023
Rigorously Assessing Natural Language Explanations of Neurons Jing-ling Huang Atticus Geiger Karel DÓosterlinck Zhengxuan Wu Christopher Potts MILM 29 26 0 19 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval Nina Shvetsova Anna Kukleva Bernt Schiele Hilde Kuehne DiffM 33 3 0 16 Sep 2023
Parts of Speech-Grounded Subspaces in Vision-Language Models James Oldfield Christos Tzelepis Yannis Panagakis M. Nicolaou Ioannis Patras 26 9 0 23 May 2023
What does CLIP know about a red circle? Visual prompt engineering for VLMs Aleksandar Shtedritski Christian Rupprecht Andrea Vedaldi VLM MLLM 32 142 0 13 Apr 2023
Defense-Prefix for Preventing Typographic Attacks on CLIP Hiroki Azuma Yusuke Matsui VLM AAML 20 17 0 10 Apr 2023
Zero-shot Model Diagnosis Jinqi Luo Zhaoning Wang Chen Henry Wu Dong Huang Fernando de la Torre VLM 24 21 0 27 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Sara Sarto Manuele Barraco Marcella Cornia Lorenzo Baraldi Rita Cucchiara 21 55 0 21 Mar 2023
SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective Zipeng Xu Songlong Xing E. Sangineto N. Sebe CLIP 30 2 0 16 Mar 2023
Teaching CLIP to Count to Ten Roni Paiss Ariel Ephrat Omer Tov Shiran Zada Inbar Mosseri Michal Irani Tali Dekel VLM CLIP 39 93 0 23 Feb 2023
CLIPPO: Image-and-Language Understanding from Pixels Only Michael Tschannen Basil Mustafa N. Houlsby CLIP VLM 32 48 0 15 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally? Zixian Ma Jerry Hong Mustafa Omer Gul Mona Gandhi Irena Gao Ranjay Krishna CoGe 37 125 0 13 Dec 2022
Task Bias in Vision-Language Models Sachit Menon I. Chandratreya Carl Vondrick VLM SSL 27 6 0 08 Dec 2022
Disentangled Representation Learning Xin Eric Wang Hong Chen Siao Tang Zihao Wu Wenwu Zhu DRL 39 78 0 21 Nov 2022
What the DAAM: Interpreting Stable Diffusion Using Cross Attention Raphael Tang Linqing Liu Akshat Pandey Zhiying Jiang Gefei Yang K. Kumar Pontus Stenetorp Jimmy J. Lin Ferhan Ture 34 167 0 10 Oct 2022
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 255 4,805 0 24 Feb 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 307 39,238 0 01 Sep 2014