Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Won't Get Fooled Again: Answering Questions with False Premises
Shengding Hu
Yi-Xiao Luo
Huadong Wang
Xingyi Cheng
Zhiyuan Liu
Maosong Sun
90
29
0
05 Jul 2023
Interactive Image Segmentation with Cross-Modality Vision Transformers
Kun Li
G. Vosselman
M. Yang
ViT
77
4
0
05 Jul 2023
SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
Lasha Abzianidze
J. Zwarts
Yoad Winter
34
2
0
05 Jul 2023
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Chenliang Li
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
VLM
MLLM
89
128
0
04 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
115
52
0
03 Jul 2023
Localized Questions in Medical Visual Question Answering
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
71
8
0
03 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
131
120
0
03 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
108
4
0
03 Jul 2023
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Hikaru Shindo
Viktor Pfanschilling
Devendra Singh Dhami
Kristian Kersting
NAI
87
9
0
03 Jul 2023
HeGeL: A Novel Dataset for Geo-Location from Hebrew Text
Tzuf Paz-Argaman
Tal Bauman
Itai Mondshine
Itzhak Omer
S. Dalyot
Reut Tsarfaty
69
3
0
02 Jul 2023
DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment
Yanjiang Guo
Yen-Jen Wang
Lihan Zha
Zheyuan Jiang
Jianyu Chen
LM&Ro
115
41
0
01 Jul 2023
Multimodal Prompt Retrieval for Generative Visual Question Answering
Timothy Ossowski
Junjie Hu
38
1
0
30 Jun 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLM
MLLM
103
238
0
29 Jun 2023
Unified Language Representation for Question Answering over Text, Tables, and Images
Yu Bowen
Cheng Fu
Haiyang Yu
Fei Huang
Yongbin Li
LMTD
75
23
0
29 Jun 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
69
7
0
29 Jun 2023
Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering
Alireza Salemi
Mahta Rafiee
Hamed Zamani
69
10
0
28 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Yiyi Zhou
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLM
VPVLM
42
2
0
27 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
155
652
0
27 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
45
5
0
25 Jun 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
138
613
0
23 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
156
860
0
23 Jun 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
122
173
0
22 Jun 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
Elizaveta Semenova
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
125
46
0
21 Jun 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
161
246
0
21 Jun 2023
ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram
Jungwoo Oh
Gyubok Lee
Seongsu Bae
Joon-Myoung Kwon
Edward Choi
100
19
0
21 Jun 2023
Dense Video Object Captioning from Disjoint Supervision
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
105
3
0
20 Jun 2023
GenPlot: Increasing the Scale and Diversity of Chart Derendering Data
Brendan Artley
68
1
0
20 Jun 2023
Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
Shivaen Ramshetty
Gaurav Verma
Srijan Kumar
80
2
0
19 Jun 2023
Renderers are Good Zero-Shot Representation Learners: Exploring Diffusion Latents for Metric Learning
Michael Tang
David Shustin
DiffM
133
0
0
19 Jun 2023
A neuro-symbolic approach for multimodal reference expression comprehension
Aman Jain
Anirudh Reddy Kondapally
Kentaro Yamada
Hitomi Yanaka
36
2
0
19 Jun 2023
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation
Shuo Chen
Yingjun Du
Pascal Mettes
Cees G. M. Snoek
OffRL
132
4
0
16 Jun 2023
Learning to Summarize and Answer Questions about a Virtual Robot's Past Actions
Chad DeChant
Iretiayo Akinola
Daniel Bauer
88
8
0
16 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
147
13
0
16 Jun 2023
Tell Me Where to Go: A Composable Framework for Context-Aware Embodied Robot Navigation
Harel Biggie
Ajay Narasimha Mopidevi
Dusty Woods
Christoffer Heckman
LM&Ro
67
11
0
15 Jun 2023
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
Thomas Mensink
J. Uijlings
Lluis Castrejon
A. Goel
Felipe Cadar
Howard Zhou
Fei Sha
A. Araújo
V. Ferrari
90
44
0
15 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLM
AuLLM
86
173
0
15 Jun 2023
Improving Selective Visual Question Answering by Learning from Your Peers
Corentin Dancette
Spencer Whitehead
Rishabh Maheshwary
Ramakrishna Vedantam
Stefan Scherer
Xinlei Chen
Matthieu Cord
Marcus Rohrbach
AAML
OOD
89
17
0
14 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
86
7
0
14 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
104
76
0
14 Jun 2023
Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training
Alyssa Huang
Peihan Liu
Ryumei Nakada
Linjun Zhang
Wanrong Zhang
VLM
141
6
0
13 Jun 2023
Visual Question Answering (VQA) on Images with Superimposed Text
V. Kodali
Daniel Berleant
47
1
0
13 Jun 2023
V-LoL: A Diagnostic Dataset for Visual Logical Learning
Lukas Helff
Wolfgang Stammer
Hikaru Shindo
Devendra Singh Dhami
Kristian Kersting
NAI
89
5
0
13 Jun 2023
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models
Raz Lapid
Moshe Sipper
AAML
110
17
0
13 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
94
211
0
11 Jun 2023
Weakly Supervised Visual Question Answer Generation
Charani Alampalle
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
73
0
0
11 Jun 2023
3D reconstruction using Structure for Motion
Kshitij Karnawat
Hritvik Choudhari
Abhimanyu Saxena
Mudit Singal
Raajith Gadam
3DV
MDE
45
1
0
10 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
111
24
0
09 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Yue Liu
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
Cuiping Li
Ziwei Liu
MLLM
VLM
105
240
0
08 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Sandro Pezzelle
70
10
0
08 Jun 2023
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Wenxuan Zhang
Sharifah Mahani Aljunied
Chang Gao
Yew Ken Chia
Lidong Bing
ELM
134
87
0
08 Jun 2023
Previous
1
2
3
...
19
20
21
...
58
59
60
Next