Does CLIP Bind Concepts? Probing Compositionality in Large Image Models

20 December 2022

Papers citing "Does CLIP Bind Concepts? Probing Compositionality in Large Image Models"

48 / 48 papers shown

Title
Compositional Image-Text Matching and Retrieval by Grounding Entities Madhukar Reddy Vongala Saurabh Srivastava Jana Kosecka CLIP CoGe VLM 41 0 0 04 May 2025
VSC: Visual Search Compositional Text-to-Image Diffusion Model Do Huu Dat Nam Hyeonu Po Yuan Mao Tae-Hyun Oh DiffM CoGe 71 0 0 02 May 2025
Human-like compositional learning of visually-grounded concepts using synthetic environments Zijun Lin M Ganesh Kumar Cheston Tan OCL CoGe 75 0 0 09 Apr 2025
Evaluating Compositional Scene Understanding in Multimodal Generative Models Shuhao Fu Andrew Jun Lee Anna Wang Ida Momennejad Trevor Bihl Hongjing Lu Taylor Webb CoGe OCL 109 1 0 29 Mar 2025
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models Davide Berasi Matteo Farina Massimiliano Mancini Elisa Ricci Nicola Strisciuglio CoGe 68 0 0 21 Mar 2025
Dynamic Relation Inference via Verb Embeddings Omri Suissa Muhiim Ali Ariana Azarbal Hui Shen Shekhar Pradhan 46 0 0 17 Mar 2025
On the Limitations of Vision-Language Models in Understanding Image Transforms Ahmad Mustafa Anis Hasnain Ali Saquib Sarfraz VLM Presented at ResearchTrend Connect \| VLM on 28 Mar 2025 151 0 0 12 Mar 2025
Is CLIP ideal? No. Can we fix it? Yes! Raphi Kang Yue Song Georgia Gkioxari Pietro Perona VLM 63 0 0 10 Mar 2025
Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting Dominic Maggio Luca Carlone 219 0 0 07 Mar 2025
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following Vivek Myers Bill Chunyuan Zheng Anca Dragan Kuan Fang Sergey Levine 72 0 0 08 Feb 2025
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios Shantanu Jaiswal Debaditya Roy Basura Fernando Cheston Tan ReLM LRM 79 2 0 20 Nov 2024
ResiDual Transformer Alignment with Spectral Decomposition Lorenzo Basile Valentino Maiorca Luca Bortolussi Emanuele Rodolà Francesco Locatello 60 1 0 31 Oct 2024
Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem Declan Campbell Sunayana Rane Tyler Giallanza Nicolò De Sabbata Kia Ghods ... Alexander Ku Steven M. Frankland Thomas Griffiths Jonathan D. Cohen Taylor W. Webb 42 13 0 31 Oct 2024
A Complexity-Based Theory of Compositionality Eric Elmoznino Thomas Jiralerspong Yoshua Bengio Guillaume Lajoie CoGe 66 5 0 18 Oct 2024
Do Pre-trained Vision-Language Models Encode Object States? Kaleb Newman Shijie Wang Yuan Zang David Heffren Chen Sun CoGe 34 1 0 16 Sep 2024
Finetuning CLIP to Reason about Pairwise Differences Dylan Sam Devin Willmott João Dias Semedo J. Zico Kolter VLM 71 3 0 15 Sep 2024
What happens to diffusion model likelihood when your model is conditional? Mattias Cross Anton Ragni DiffM 42 0 0 10 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning Manu Gaur Darshan Singh Makarand Tapaswi 186 1 0 04 Sep 2024
Relational Composition in Neural Networks: A Survey and Call to Action Martin Wattenberg Fernanda Viégas CoGe 50 9 0 19 Jul 2024
Towards Compositionality in Concept Learning Adam Stein Aaditya Naik Yinjun Wu Mayur Naik Eric Wong CoGe 39 2 0 26 Jun 2024
Improving Interpretability and Robustness for the Detection of AI-Generated Images T. Gaintseva Laida Kushnareva German Magai Irina Piontkovskaya Sergey I. Nikolenko Martin Benning S. Barannikov Gregory Slabaugh 39 1 0 21 Jun 2024
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding Wufei Ma Guanning Zeng Guofeng Zhang Qihao Liu Letian Zhang Adam Kortylewski Yaoyao Liu Alan Yuille VLM 3DV 49 7 0 13 Jun 2024
When does compositional structure yield compositional generalization? A kernel theory Samuel Lippl Kim Stachenfeld NAI CoGe 73 6 0 26 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks Jacob Russin Sam Whitman McGrath Danielle J. Williams Lotem Elber-Dorozko AI4CE 83 3 0 24 May 2024
Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation Kevin Stangl Marius Arvinte Weilin Xu Cory Cornelius VLM UQCV 40 0 0 13 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward Raphael Milliere Cameron Buckner LRM 66 14 0 06 May 2024
Improving Concept Alignment in Vision-Language Concept Bottleneck Models Nithish Muthuchamy Selvaraj Xiaobao Guo Bingquan Shen A. Kong Alex C. Kot VLM 51 0 0 03 May 2024
Pre-trained Vision-Language Models Learn Discoverable Visual Concepts Yuan Zang Tian Yun Hao Tan Trung Bui Chen Sun VLM CoGe 63 9 0 19 Apr 2024
Probing the 3D Awareness of Visual Foundation Models Mohamed El Banani Amit Raj Kevis-Kokitsi Maninis Abhishek Kar Yuanzhen Li Michael Rubinstein Deqing Sun Leonidas J. Guibas Justin Johnson Varun Jampani 40 78 0 12 Apr 2024
Language Plays a Pivotal Role in the Object-Attribute Compositional Generalization of CLIP Reza Abbasi Mohammad Samiei M. Rohban M. Baghshah VLM CoGe 35 0 0 27 Mar 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions Reza Esfandiarpoor Cristina Menghini Stephen H. Bach CoGe VLM 40 8 0 25 Mar 2024
Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images Hansa Srinivasan Candice Schumann Aradhana Sinha David Madras Gbolahan O. Olanubi Alex Beutel Susanna Ricco Jilin Chen 40 5 0 25 Jan 2024
Grounded learning for compositional vector semantics Martha Lewis CoGe 42 0 0 10 Jan 2024
FoMo Rewards: Can we cast foundation models as reward functions? Ekdeep Singh Lubana Johann Brehmer P. D. Haan Taco S. Cohen OffRL LRM 48 2 0 06 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Rahul Ramesh Ekdeep Singh Lubana Mikail Khona Robert P. Dick Hidenori Tanaka CoGe 39 8 0 21 Nov 2023
SelfEval: Leveraging the discriminative nature of generative models for evaluation Sai Saketh Rambhatla Ishan Misra EGVM 38 4 0 17 Nov 2023
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task Maya Okawa Ekdeep Singh Lubana Robert P. Dick Hidenori Tanaka CoGe DiffM 39 46 0 13 Oct 2023
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models Vishaal Udandarao Max F. Burg Samuel Albanie Matthias Bethge VLM 39 9 0 12 Oct 2023
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control Vivek Myers Andre Wang He Kuan Fang Homer Walke Philippe Hansen-Estruch Ching-An Cheng Mihai Jalobeanu Andrey Kolobov Anca Dragan Sergey Levine LM&Ro 32 29 0 30 Jun 2023
Are Diffusion Models Vision-And-Language Reasoners? Benno Krojer Elinor Poole-Dayan Vikram S. Voleti Christopher Pal Siva Reddy 45 13 0 25 May 2023
Prompting Language-Informed Distribution for Compositional Zero-Shot Learning Wentao Bao Lichang Chen Heng-Chiao Huang Yu Kong CoGe VLM 33 12 0 23 May 2023
Text-to-Image Diffusion Models are Zero-Shot Classifiers Kevin Clark P. Jaini DiffM VLM 38 107 0 27 Mar 2023
ConceptFusion: Open-set Multimodal 3D Mapping Krishna Murthy Jatavallabhula Ali Kuwajerwala Qiao Gu Mohd. Omama Tao Chen ... Celso Miguel de Melo Madhava Krishna Liam Paull Florian Shkurti Antonio Torralba 35 232 0 14 Feb 2023
When are Lemons Purple? The Concept Association Bias of Vision-Language Models Yutaro Yamada Yingtian Tang Yoyo Zhang Ilker Yildirim CoGe 26 14 0 22 Dec 2022
Compositional Generalisation with Structured Reordering and Fertility Layers Matthias Lindemann Alexander Koller Ivan Titov CoGe 40 7 0 06 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong Guosheng Lin MLLM BDL VLM CLIP 392 4,171 0 28 Jan 2022
Memorisation versus Generalisation in Pre-trained Language Models Michael Tänzer Sebastian Ruder Marek Rei 94 50 0 16 Apr 2021
From Frequency to Meaning: Vector Space Models of Semantics Peter D. Turney Patrick Pantel 110 2,982 0 04 Mar 2010