Rethinking Benchmarks for Cross-modal Image-text Retrieval

Rethinking Benchmarks for Cross-modal Image-text Retrieval

21 April 2023

Qin Jin

Papers citing "Rethinking Benchmarks for Cross-modal Image-text Retrieval"

15 / 15 papers shown

Title
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images Andreas Koukounas Georgios Mastrapas Bo Wang Mohammad Kalim Akram Sedigheh Eslami Michael Gunther Isabelle Mohr Saba Sturua Scott Martens Nan Wang VLM 260 9 0 11 Dec 2024
RedCaps: web-curated image-text data created by the people, for the people Karan Desai Gaurav Kaul Zubin Aysola Justin Johnson 90 166 0 22 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Yan Zeng Xinsong Zhang Hang Li VLM CLIP 51 303 0 16 Nov 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning Jack Hessel Ari Holtzman Maxwell Forbes Ronan Le Bras Yejin Choi CLIP 117 1,545 0 18 Apr 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision Wonjae Kim Bokyung Son Ildoo Kim VLM CLIP 112 1,735 0 05 Feb 2021
Similarity Reasoning and Filtration for Image-Text Matching Haiwen Diao Ying Zhang Lingyun Ma Huchuan Lu 275 335 0 05 Jan 2021
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning Wei Li Can Gao Guocheng Niu Xinyan Xiao Hao Liu Jiachen Liu Hua Wu Haifeng Wang 78 378 0 31 Dec 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph Fei Yu Jiji Tang Weichong Yin Yu Sun Hao Tian Hua Wu Haifeng Wang 61 377 0 30 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning Zhe Gan Yen-Chun Chen Linjie Li Chen Zhu Yu Cheng Jingjing Liu ObjD VLM 56 494 0 11 Jun 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Xiujun Li Xi Yin Chunyuan Li Pengchuan Zhang Xiaowei Hu ... Houdong Hu Li Dong Furu Wei Yejin Choi Jianfeng Gao VLM 84 1,934 0 13 Apr 2020
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval Sijin Wang Ruiping Wang Ziwei Yao Shiguang Shan Xilin Chen 3DV 74 211 0 11 Oct 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval Zihao Wang Xihui Liu Hongsheng Li Lu Sheng Junjie Yan Xiaogang Wang Jing Shao VLM 60 304 0 12 Sep 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu Dhruv Batra Devi Parikh Stefan Lee SSL VLM 215 3,667 0 06 Aug 2019
Stacked Cross Attention for Image-Text Matching Kuang-Huei Lee Xi Chen G. Hua Houdong Hu Xiaodong He 74 1,151 0 21 Mar 2018
Semi-Supervised Classification with Graph Convolutional Networks Thomas Kipf Max Welling GNN SSL 559 28,964 0 09 Sep 2016