Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.12522
Cited By
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
25 May 2022
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset"
18 / 18 papers shown
Title
Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Julian Spravil
Sebastian Houben
Sven Behnke
VLM
70
0
0
12 Mar 2025
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
103
7
0
11 Dec 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
69
21
0
18 Oct 2024
Vision-Language Models under Cultural and Inclusive Considerations
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
51
7
0
08 Jul 2024
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
60
21
0
27 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
84
9
0
14 Jun 2024
Image captioning in different languages
Emiel van Miltenburg
VLM
39
0
0
31 May 2024
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Jesse Atuhurra
Iqra Ali
Tatsuya Hiraoka
Hidetaka Kamigaito
Tomoya Iwakura
Taro Watanabe
40
1
0
29 Mar 2024
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Sanchit Ahuja
Divyanshu Aggarwal
Varun Gumma
Ishaan Watts
Ashutosh Sathe
...
Rishav Hada
Prachi Jain
Maxamed Axmed
Kalika Bali
Sunayana Sitaram
ELM
34
39
0
13 Nov 2023
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
50
3
0
22 Oct 2023
C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap
William Theisen
Walter J. Scheirer
CLIP
VLM
29
2
0
06 Sep 2023
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Fulong Ye
Guangyi Liu
Xinya Wu
Ledell Yu Wu
VLM
29
25
0
19 Aug 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
51
187
0
29 May 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
19
940
0
27 Mar 2023
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
Can Qin
Ning Yu
Chen Xing
Shu Zhen Zhang
Zeyuan Chen
Stefano Ermon
Yun Fu
Caiming Xiong
Ran Xu
DiffM
40
19
0
17 Mar 2023
Connecting Vision and Language with Video Localized Narratives
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
44
21
0
22 Feb 2023
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
27
29
0
16 Feb 2023
Geographic and Geopolitical Biases of Language Models
Fahim Faisal
Antonios Anastasopoulos
18
19
0
20 Dec 2022
1