Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.10824
Cited By
Rethinking Benchmarks for Cross-modal Image-text Retrieval
21 April 2023
Wei Chen
Linli Yao
Qin Jin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rethinking Benchmarks for Cross-modal Image-text Retrieval"
15 / 15 papers shown
Title
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
260
9
0
11 Dec 2024
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
90
166
0
22 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLM
CLIP
51
303
0
16 Nov 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
117
1,545
0
18 Apr 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLM
CLIP
112
1,735
0
05 Feb 2021
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
275
335
0
05 Jan 2021
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua Wu
Haifeng Wang
78
378
0
31 Dec 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
61
377
0
30 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
56
494
0
11 Jun 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
84
1,934
0
13 Apr 2020
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
74
211
0
11 Oct 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
60
304
0
12 Sep 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
215
3,667
0
06 Aug 2019
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
74
1,151
0
21 Mar 2018
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf
Max Welling
GNN
SSL
559
28,964
0
09 Sep 2016
1