ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.03689
  4. Cited By
COLA: A Benchmark for Compositional Text-to-image Retrieval
v1v2v3 (latest)

COLA: A Benchmark for Compositional Text-to-image Retrieval

5 May 2023
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
    CoGeVLM
ArXiv (abs)PDFHTML

Papers citing "COLA: A Benchmark for Compositional Text-to-image Retrieval"

30 / 30 papers shown
Title
A Good CREPE needs more than just Sugar: Investigating Biases in Compositional Vision-Language Benchmarks
Vishaal Udandarao
Mehdi Cherti
Shyamgopal Karthik
J. Jitsev
Samuel Albanie
Matthias Bethge
CoGe
18
0
0
09 Jun 2025
Diffusion Classifiers Understand Compositionality, but Conditions Apply
Diffusion Classifiers Understand Compositionality, but Conditions Apply
Yujin Jeong
Arnas Uselis
Seong Joon Oh
Anna Rohrbach
DiffMCoGe
562
0
3
23 May 2025
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shinýa Yamaguchi
Dewei Feng
Sekitoshi Kanai
Kazuki Adachi
Daiki Chijiwa
VLM
88
2
0
17 Apr 2025
TMCIR: Token Merge Benefits Composed Image Retrieval
TMCIR: Token Merge Benefits Composed Image Retrieval
Chaoyang Wang
Zeyu Zhang
Long Teng
Zijun Li
Shichao Kan
105
0
0
15 Apr 2025
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Samarth Mishra
Kate Saenko
Venkatesh Saligrama
CoGeLRM
69
0
0
07 Apr 2025
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chunbai Zhang
Chao Wang
Yang Zhou
Yan Peng
LRMReLM
144
0
0
02 Feb 2025
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
Éloi Zablocki
Valentin Gerard
Amaia Cardiel
Eric Gaussier
Matthieu Cord
Eduardo Valle
147
0
0
23 Nov 2024
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic
  Vision-Language Negatives
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Maitreya Patel
Abhiram Kusumba
Sheng Cheng
Changhoon Kim
Tejas Gokhale
Chitta Baral
Yezhou Yang
CoGeCLIP
137
14
0
04 Nov 2024
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
Edward Vendrow
Omiros Pantazis
Alexander Shepard
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
Sara Beery
Grant Van Horn
VLM
105
7
0
04 Nov 2024
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving
  Vision-Linguistic Compositionality
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
Youngtaek Oh
Jae-Won Cho
Dong-Jin Kim
In So Kweon
Junmo Kim
VLMCoGeCLIP
103
6
0
07 Oct 2024
EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections
EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections
Francesc Net
Lluís Gómez
63
0
0
02 Oct 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
Xiaohu Yang
Weiwei Li
Peng Wang
ObjD
134
5
0
23 Sep 2024
GlyphPattern: An Abstract Pattern Recognition Benchmark for Vision-Language Models
GlyphPattern: An Abstract Pattern Recognition Benchmark for Vision-Language Models
Zixuan Wu
Yoolim Kim
Carolyn Jane Anderson
VLM
71
0
0
12 Aug 2024
The Curious Case of Representational Alignment: Unravelling
  Visio-Linguistic Tasks in Emergent Communication
The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication
Tom Kouwenhoven
Max Peeperkorn
Bram van Dijk
Tessa Verhoef
58
4
0
25 Jul 2024
Deciphering the Role of Representation Disentanglement: Investigating
  Compositional Generalization in CLIP Models
Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
Reza Abbasi
M. Rohban
M. Baghshah
CoGe
81
8
0
08 Jul 2024
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional
  Temporal Grounding
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
Zixu Cheng
Yujiang Pu
Shaogang Gong
Parisa Kordjamshidi
Yu Kong
AI4TS
72
0
0
06 Jul 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of
  Text-to-Time-lapse Video Generation
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVMVGen
136
40
0
26 Jun 2024
MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
Shuo Xu
Sai Wang
Xinyue Hu
Yutian Lin
Bo Du
Yu Wu
CoGe
170
2
0
18 Jun 2024
BiVLC: Extending Vision-Language Compositionality Evaluation with
  Text-to-Image Retrieval
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
Imanol Miranda
Ander Salaberria
Eneko Agirre
Gorka Azkune
CoGe
85
2
0
14 Jun 2024
Comparison Visual Instruction Tuning
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
91
4
0
13 Jun 2024
Position: Do Not Explain Vision Models Without Context
Position: Do Not Explain Vision Models Without Context
Paulina Tomaszewska
Przemysław Biecek
66
1
0
28 Apr 2024
SPARO: Selective Attention for Robust and Compositional Transformer
  Encodings for Vision
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani
Bac Nguyen
Samuel Lavoie
Ranjay Krishna
Aaron Courville
85
1
0
24 Apr 2024
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection
  and Correction
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua
Jing Shi
Kushal Kafle
Simon Jenni
Daoan Zhang
John Collomosse
Scott D. Cohen
Jiebo Luo
CoGeVLM
92
12
0
23 Apr 2024
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Tianyu Zhu
M. Jung
Jesse Clark
134
1
0
12 Apr 2024
Do Vision-Language Models Understand Compound Nouns?
Do Vision-Language Models Understand Compound Nouns?
Sonal Kumar
Sreyan Ghosh
S. Sakshi
Utkarsh Tyagi
Dinesh Manocha
CLIPCoGeVLM
82
1
0
30 Mar 2024
Contrastive Region Guidance: Improving Grounding in Vision-Language
  Models without Training
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan
Jaemin Cho
Elias Stengel-Eskin
Mohit Bansal
VLMObjD
113
36
0
04 Mar 2024
CLoVe: Encoding Compositional Language in Contrastive Vision-Language
  Models
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models
Santiago Castro
Amir Ziai
Avneesh Saluja
Zhuoning Yuan
Rada Mihalcea
MLLMCoGeVLM
76
6
0
22 Feb 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLMCLIP
67
3
0
15 Jan 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGeVLM
296
3
0
28 Dec 2023
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLMCLIP
348
1,060
0
09 Oct 2021
1