SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

26 June 2023

Papers citing "SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality"

46 / 96 papers shown

Title
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval Imanol Miranda Ander Salaberria Eneko Agirre Gorka Azkune CoGe 46 1 0 14 Jun 2024
Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition Youngtaek Oh Pyunghwan Ahn Jinhyung Kim Gwangmo Song Soonyoung Lee In So Kweon Junmo Kim CoGe 48 2 0 13 Jun 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs Irene Huang Wei Lin M. Jehanzeb Mirza Jacob A. Hansen Sivan Doveh ... Trevor Darrel Chuang Gan Aude Oliva Rogerio Feris Leonid Karlinsky CoGe LRM 43 7 0 12 Jun 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View Jin Wang Shichao Dong Yapeng Zhu Kelu Yao Weidong Zhao Chao Li Ping Luo CoGe LRM 55 2 0 27 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks Jacob Russin Sam Whitman McGrath Danielle J. Williams Lotem Elber-Dorozko AI4CE 88 3 0 24 May 2024
Efficient Vision-Language Pre-training by Cluster Masking Zihao Wei Zixuan Pan Andrew Owens VLM 29 8 0 14 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward Raphael Milliere Cameron Buckner LRM 66 14 0 06 May 2024
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations Sri Harsha Dumpala Aman Jaiswal Chandramouli Shama Sastry E. Milios Sageev Oore Hassan Sajjad VLM CoGe 48 0 0 25 Apr 2024
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision Ankit Vani Bac Nguyen Samuel Lavoie Ranjay Krishna Aaron Courville 39 1 0 24 Apr 2024
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction Hang Hua Jing Shi Kushal Kafle Simon Jenni Daoan Zhang John Collomosse Scott D. Cohen Jiebo Luo CoGe VLM 50 9 0 23 Apr 2024
Pre-trained Vision-Language Models Learn Discoverable Visual Concepts Yuan Zang Tian Yun Hao Tan Trung Bui Chen Sun VLM CoGe 63 9 0 19 Apr 2024
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement Zaid Khan B. Vijaykumar S. Schulter Yun Fu Manmohan Chandraker LRM ReLM 39 6 0 06 Apr 2024
Iterated Learning Improves Compositionality in Large Vision-Language Models Chenhao Zheng Jieyu Zhang Aniruddha Kembhavi Ranjay Krishna VLM CoGe 56 9 0 02 Apr 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations Jaisidh Singh Ishaan Shrivastava Mayank Vatsa Richa Singh Aparna Bharati VLM CoGe 34 14 0 29 Mar 2024
DreamLIP: Language-Image Pre-training with Long Captions Kecheng Zheng Yifei Zhang Wei Wu Fan Lu Shuailei Ma Xin Jin Wei Chen Yujun Shen VLM CLIP 49 26 0 25 Mar 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions Reza Esfandiarpoor Cristina Menghini Stephen H. Bach CoGe VLM 40 8 0 25 Mar 2024
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Fucai Ke Zhixi Cai Simindokht Jahangard Weiqing Wang P. D. Haghighi Hamid Rezatofighi LRM 51 10 0 19 Mar 2024
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training David Wan Jaemin Cho Elias Stengel-Eskin Mohit Bansal VLM ObjD 53 30 0 04 Mar 2024
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress Ameya Prabhu Vishaal Udandarao Philip Torr Matthias Bethge Adel Bibi Samuel Albanie 52 5 0 29 Feb 2024
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models Santiago Castro Amir Ziai Avneesh Saluja Zhuoning Yuan Rada Mihalcea MLLM CoGe VLM 42 5 0 22 Feb 2024
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples Jianrui Zhang Mu Cai Tengyang Xie Yong Jae Lee LRM 45 20 0 20 Feb 2024
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships Sebastian Koch Narunas Vaskevicius Mirco Colosi Pedro Hermosilla Timo Ropinski 3DPC 36 27 0 19 Feb 2024
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding Alessandro Achille Greg Ver Steeg Tian Yu Liu Matthew Trager Carson Klingenberg Stefano Soatto 33 1 0 14 Feb 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos S. DarshanSingh Zeeshan Khan Makarand Tapaswi VLM CLIP 41 3 0 15 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs Shengbang Tong Zhuang Liu Yuexiang Zhai Yi Ma Yann LeCun Saining Xie VLM MLLM 41 288 0 11 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers Aleksandar Stanić Sergi Caelles Michael Tschannen LRM VLM 27 9 0 03 Jan 2024
Generating Enhanced Negatives for Training Language-Based Object Detectors Shiyu Zhao Long Zhao Vijay Kumar B.G Yumin Suh Dimitris N. Metaxas Manmohan Chandraker S. Schulter ObjD VLM 41 5 0 29 Dec 2023
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks Haz Sameen Shahgir Xianghao Kong Greg Ver Steeg Yue Dong 16 5 0 22 Dec 2023
Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining Bumsoo Kim Yeonsik Jo Jinhyung Kim S. Kim VLM 32 8 0 19 Dec 2023
A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics Xiangru Zhu Penglei Sun Chengyu Wang Jingping Liu Zhixu Li Yanghua Xiao Jun Huang CoGe 116 5 0 04 Dec 2023
What's "up" with vision-language models? Investigating their struggle with spatial reasoning Amita Kamath Jack Hessel Kai-Wei Chang LRM CoGe 19 98 0 30 Oct 2023
Vision-by-Language for Training-Free Compositional Image Retrieval Shyamgopal Karthik Karsten Roth Massimiliano Mancini Zeynep Akata CoGe 30 53 0 13 Oct 2023
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models Vishaal Udandarao Max F. Burg Samuel Albanie Matthias Bethge VLM 39 9 0 12 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning Mustafa Shukor Alexandre Ramé Corentin Dancette Matthieu Cord LRM MLLM 48 20 0 01 Oct 2023
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? Fei-Yue Wang Liang Ding Jun Rao Ye Liu Li Shen Changxing Ding 34 15 0 24 Aug 2023
An Examination of the Compositionality of Large Generative Vision-Language Models Teli Ma Rong Li Junwei Liang CoGe 39 2 0 21 Aug 2023
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding Le Zhang Rabiul Awal Aishwarya Agrawal CoGe VLM 38 10 0 15 Jun 2023
Image Captioners Are Scalable Vision Learners Too Michael Tschannen Manoj Kumar Andreas Steiner Xiaohua Zhai N. Houlsby Lucas Beyer VLM CLIP 32 54 0 13 Jun 2023
Revisiting the Role of Language Priors in Vision-Language Models Zhiqiu Lin Xinyue Chen Deepak Pathak Pengchuan Zhang Deva Ramanan VLM 36 22 0 02 Jun 2023
Compositional diversity in visual concept learning Yanli Zhou Reuben Feinman Brenden M. Lake CoGe OCL 32 8 0 30 May 2023
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics Saba Ahmadi Aishwarya Agrawal 30 6 0 24 May 2023
Text encoders bottleneck compositionality in contrastive vision-language models Amita Kamath Jack Hessel Kai-Wei Chang CoGe CLIP VLM 30 19 0 24 May 2023
Probing Conceptual Understanding of Large Visual-Language Models Madeline Chantry Schiappa Raiyaan Abdullah Shehreen Azad Jared Claypoole Michael Cogswell Ajay Divakaran Yogesh S Rawat 58 14 0 07 Apr 2023
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models Martha Lewis Nihal V. Nayak Peilin Yu Qinan Yu Jack Merullo Stephen H. Bach Ellie Pavlick VLM OCL CoGe 34 59 0 20 Dec 2022
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality Anuj Diwan Layne Berry Eunsol Choi David Harwath Kyle Mahowald CoGe 117 41 0 01 Nov 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong Guosheng Lin MLLM BDL VLM CLIP 392 4,185 0 28 Jan 2022