Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.06177
Cited By
Vision-Language Models under Cultural and Inclusive Considerations
8 July 2024
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vision-Language Models under Cultural and Inclusive Considerations"
16 / 16 papers shown
Title
Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models
Minh Duc Bui
Katharina von der Wense
Anne Lauscher
VLM
56
1
0
06 Nov 2024
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Romero
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
...
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
78
38
0
10 Jun 2024
Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People
Ricardo E Gonzalez Penuela
Jazmin Collins
Shiri Azenkot
Cynthia L. Bennett
62
25
0
22 Mar 2024
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models
Mor Ventura
Eyal Ben-David
Anna Korhonen
Roi Reichart
52
13
0
03 Oct 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
95
2,049
0
11 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
409
4,539
0
30 Jan 2023
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
81
717
0
14 Sep 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
123
548
0
27 May 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
129
76
0
25 May 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
71
98
0
31 Jan 2022
Multi-Modal Image Captioning for the Visually Impaired
Hiba Ahsan
Nikita Bhalla
Daivat Bhatt
Kaivankumar Shah
60
22
0
17 May 2021
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
61
182
0
20 Feb 2020
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
84
847
0
22 Feb 2018
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
92
1,914
0
29 Jul 2016
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
205
2,475
0
01 Apr 2015
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
274
4,481
0
20 Nov 2014
1