Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Yangyi Chen
Xingyao Wang
Manling Li
Derek Hoiem
Heng Ji
81
12
0
22 Nov 2023
NERIF: GPT-4V for Automatic Scoring of Drawn Models
Gyeong-Geon Lee
Xiaoming Zhai
83
11
0
21 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
104
4
0
21 Nov 2023
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
Minghe Gao
Juncheng Li
Hao Fei
Liang Pang
Wei Ji
Guoming Wang
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
73
9
0
21 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
169
291
0
21 Nov 2023
Causality is all you need
Ning Xu
Yifei Gao
Hongshuo Tian
Yongdong Zhang
An-An Liu
82
0
0
21 Nov 2023
Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
Ziyue Wang
Chi Chen
Peng Li
Yang Liu
LRM
78
16
0
20 Nov 2023
An Embodied Generalist Agent in 3D World
Jiangyong Huang
Silong Yong
Xiaojian Ma
Xiongkun Linghu
Puhao Li
Yan Wang
Qing Li
Song-Chun Zhu
Baoxiong Jia
Siyuan Huang
LM&Ro
118
176
0
18 Nov 2023
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?
Bangzheng Li
Ben Zhou
Fei Wang
Xingyu Fu
Dan Roth
Muhao Chen
HILM
LRM
104
22
0
16 Nov 2023
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
Yunshi Lan
Xiang Li
Xin Liu
Yang Li
Wei Qin
Weining Qian
LRM
ReLM
157
29
0
15 Nov 2023
GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Serwan Jassim
Mario S. Holubar
Annika Richter
Cornelius Wolff
Xenia Ohmer
Elia Bruni
ELM
94
14
0
15 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
96
5
0
15 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
90
12
0
14 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
146
249
0
14 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Peng Gao
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Hongsheng Li
Yu Qiao
MLLM
VLM
115
231
0
13 Nov 2023
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu-Gang Jiang
MLLM
VLM
121
108
0
13 Nov 2023
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering
Yunxin Li
Longyue Wang
Baotian Hu
Xinyu Chen
Wanqi Zhong
Chenyang Lyu
Wei Wang
Min Zhang
ELM
77
22
0
13 Nov 2023
Teach me with a Whisper: Enhancing Large Language Models for Analyzing Spoken Transcripts using Speech Embeddings
Fatema Hasan
Yulong Li
James R. Foulds
Shimei Pan
Bishwaranjan Bhattacharjee
79
2
0
13 Nov 2023
Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering
Yichi Zhang
Zhuo Chen
Yin Fang
Yanxi Lu
Fangming Li
Wen Zhang
Hua-zeng Chen
114
31
0
11 Nov 2023
Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Calvin Luo
Boqing Gong
Ting Chen
Chen Sun
OCL
ObjD
52
1
0
10 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
59
6
0
09 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
71
2
0
08 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
123
55
0
08 Nov 2023
OtterHD: A High-Resolution Multi-modality Model
Yue Liu
Peiyuan Zhang
Jingkang Yang
Yuanhan Zhang
Fanyi Pu
Ziwei Liu
VLM
MLLM
100
66
0
07 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
77
7
0
07 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
176
517
0
06 Nov 2023
Perturbation-based Active Learning for Question Answering
Fan Luo
Mihai Surdeanu
81
0
0
04 Nov 2023
A New Fine-grained Alignment Method for Image-text Matching
Yang Zhang
36
1
0
03 Nov 2023
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos
Te-Lin Wu
Zi-Yi Dou
Qingyuan Hu
Yu Hou
Nischal Reddy Chandra
Marjorie Freedman
R. Weischedel
Nanyun Peng
114
8
0
02 Nov 2023
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
Liqiang Jing
Ruosen Li
Yunmo Chen
Mengzhao Jia
Xinya Du
MLLM
93
7
0
02 Nov 2023
VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization
Suraj Jyothi Unni
Raha Moraffah
Huan Liu
86
3
0
01 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLM
DiffM
103
11
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
154
44
0
01 Nov 2023
Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
Deepanway Ghosal
Navonil Majumder
Roy Ka-wei Lee
Rada Mihalcea
Soujanya Poria
64
8
0
31 Oct 2023
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
105
2
0
30 Oct 2023
Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
Changsheng Lv
Shuai Zhang
Yapeng Tian
Mengshi Qi
Huadong Ma
CML
100
18
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
63
3
0
30 Oct 2023
Learning to Follow Object-Centric Image Editing Instructions Faithfully
Tuhin Chakrabarty
Kanishk Singh
Arkadiy Saakyan
Smaranda Muresan
DiffM
77
7
0
29 Oct 2023
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery
H. F. Alsan
Taner Arsan
72
2
0
29 Oct 2023
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Seongsu Bae
Daeun Kyung
Jaehee Ryu
Eunbyeol Cho
Gyubok Lee
...
Jungwoo Oh
Lei Ji
E. Chang
Tackeun Kim
Edward Choi
120
23
0
28 Oct 2023
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese
Khiem Vinh Tran
Hao Phu Phan
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
54
7
0
27 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan Yuille
CoGe
105
14
0
27 Oct 2023
Impressions: Understanding Visual Semiotics and Aesthetic Impact
Julia Kruk
Caleb Ziems
Diyi Yang
63
2
0
27 Oct 2023
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
Laura Cabello
Emanuele Bugliarello
Stephanie Brandl
Desmond Elliott
74
7
0
26 Oct 2023
Exploring Question Decomposition for Zero-Shot VQA
Zaid Khan
B. Vijaykumar
S. Schulter
Manmohan Chandraker
Yun Fu
ReLM
62
12
0
25 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
73
0
0
25 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
80
10
0
25 Oct 2023
V
D
\mathbb{VD}
VD
-
G
R
\mathbb{GR}
GR
: Boosting
V
\mathbb{V}
V
isual
D
\mathbb{D}
D
ialog with Cascaded Spatial-Temporal Multi-Modal
G
R
\mathbb{GR}
GR
aphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
58
4
0
25 Oct 2023
Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
K. Okada
Masayuki Inaba
30
0
0
25 Oct 2023
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji
Meng Cao
Tengtao Song
Long Chen
Yi Wang
Yuexian Zou
88
6
0
25 Oct 2023
Previous
1
2
3
...
15
16
17
...
58
59
60
Next