Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1410.0210
Cited By
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
1 October 2014
Mateusz Malinowski
Mario Fritz
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input"
50 / 330 papers shown
Title
Occlusion-Ordered Semantic Instance Segmentation
Soroosh Baselizadeh
Cheuk-To Yu
O. Veksler
Yuri Boykov
ISeg
3DV
58
0
0
18 Apr 2025
ExDDV: A New Dataset for Explainable Deepfake Detection in Video
Vlad Hondru
Eduard Hogea
Darian M. Onchis
Radu Tudor Ionescu
65
1
0
18 Mar 2025
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
69
1
0
02 Dec 2024
SparrowVQE: Visual Question Explanation for Course Content Understanding
Jialu Li
Manish Kumar Thota
Ruslan Gokhman
Radek Holik
Youshan Zhang
41
1
0
12 Nov 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
33
0
0
30 Oct 2024
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
Deeparghya Dutta Barua
Md Sakib Ul Rahman Sourove
Md Farhan Ishmam
Fabiha Haider
Fariha Tanjim Shifat
Md Fahim
Md Farhad Alam
29
0
0
19 Oct 2024
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering
Ting Yu
Kunhao Fu
Jian Zhang
Qingming Huang
Jun Yu
44
2
0
12 Oct 2024
Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models
Hao Cheng
Erjia Xiao
Chengyuan Yu
Zhao Yao
Jiahang Cao
...
Jiaxu Wang
Mengshu Sun
Kaidi Xu
Jindong Gu
Renjing Xu
AAML
36
3
0
20 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
41
0
0
17 Sep 2024
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Koen Kraaijveld
Yifan Jiang
Kaixin Ma
Filip Ilievski
LRM
34
1
0
06 Sep 2024
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
Su Hyeon Lim
Minkuk Kim
Hyeon Bae Kim
Seong Tae Kim
ReLM
LRM
45
0
0
30 Aug 2024
Towards Flexible Evaluation for Generative Visual Question Answering
Huishan Ji
Q. Si
Zheng Lin
Weiping Wang
36
1
0
01 Aug 2024
Multi-Modal Video Dialog State Tracking in the Wild
Adnen Abdessaied
Lei Shi
Andreas Bulling
19
2
0
02 Jul 2024
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Romero
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
...
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
48
26
0
10 Jun 2024
Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera
Inpyo Song
Minjun Joo
Joonhyung Kwon
Jangwon Lee
EgoV
49
4
0
30 May 2024
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Quan Van Nguyen
Dan Quang Tran
Huy Quang Pham
Thang Kien-Bao Nguyen
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
CoGe
39
3
0
16 Apr 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar
Eric Peh
Ruirui Chen
Ting En Lam
Yuhan Chen
Elston Tan
Basura Fernando
CML
40
7
0
01 Apr 2024
Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs
Jialou Wang
Manli Zhu
Yulei Li
Honglei Li
Long Yang
Wai Lok Woo
19
1
0
01 Apr 2024
JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
Eri Onami
Shuhei Kurita
Taiki Miyanishi
Taro Watanabe
27
1
0
28 Mar 2024
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments
Savitha Sam Abraham
Marjan Alirezaie
Luc de Raedt
33
1
0
05 Mar 2024
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing
Zhiyuan Chang
Mingyang Li
Junjie Wang
Cheng Li
Qing Wang
27
0
0
05 Mar 2024
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model
Hao-Ran Cheng
Erjia Xiao
Jindong Gu
Le Yang
Jinhao Duan
Jize Zhang
Jiahang Cao
Kaidi Xu
Renjing Xu
39
6
0
29 Feb 2024
Grounding Language Models for Visual Entity Recognition
Zilin Xiao
Ming Gong
Paola Cascante-Bonilla
Xingyao Zhang
Jie Wu
Vicente Ordonez
VLM
48
9
0
28 Feb 2024
Visually Dehallucinative Instruction Generation
Sungguk Cha
Jusung Lee
Younghyun Lee
Cheoljong Yang
MLLM
22
5
0
13 Feb 2024
WebVLN: Vision-and-Language Navigation on Websites
Qi Chen
D. Pitawela
Chongyang Zhao
Gengze Zhou
Hsiang-Ting Chen
Qi Wu
44
8
0
25 Dec 2023
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai
Shengli Song
Shiqi Meng
Jingyang Li
Sitong Yan
Guangneng Hu
25
5
0
21 Dec 2023
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Jian Tang
36
0
0
20 Dec 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
53
36
0
01 Nov 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
33
5
0
17 Oct 2023
Human Mobility Question Answering (Vision Paper)
Hao Xue
Flora D. Salim
32
0
0
02 Oct 2023
Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance
Lei Fan
Bo Liu
Haoxiang Li
Ying Wu
Gang Hua
31
4
0
14 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
43
7
0
05 Sep 2023
Learning the meanings of function words from grounded language using a visual question answering model
Eva Portelance
Michael C. Frank
Dan Jurafsky
NAI
38
7
0
16 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
30
13
0
11 Aug 2023
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
32
18
0
21 Jul 2023
FunQA: Towards Surprising Video Comprehension
Binzhu Xie
Sicheng Zhang
Zitang Zhou
Bo Li
Yuanhan Zhang
Jack Hessel
Jingkang Yang
Ziwei Liu
44
21
0
26 Jun 2023
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
Thomas Mensink
J. Uijlings
Lluis Castrejon
A. Goel
Felipe Cadar
Howard Zhou
Fei Sha
A. Araújo
V. Ferrari
42
38
0
15 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
43
7
0
14 Jun 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
76
190
0
29 May 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
David Ifeoluwa Adelani
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
Ibrahim Said Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
28
5
0
28 May 2023
Multimodal Graph Transformer for Multimodal Question Answering
Xuehai He
Xin Eric Wang
38
7
0
30 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Guohao Li
M. Worring
OOD
AAML
52
5
0
06 Apr 2023
I2I: Initializing Adapters with Improvised Knowledge
Tejas Srinivasan
Furong Jia
Mohammad Rostami
Jesse Thomason
CLL
34
6
0
04 Apr 2023
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kan Chen
Xiangqian Wu
CoGe
32
8
0
05 Mar 2023
Medical visual question answering using joint self-supervised learning
Yuan Zhou
Jing Mei
Yiqin Yu
Tanveer Syeda-Mahmood
MedIm
38
1
0
25 Feb 2023
Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering
T. Yamane
Pang-jo Chun
Jiachen Dang
Takayuki Okatani
18
0
0
18 Feb 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
15
1
0
28 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
35
16
0
26 Dec 2022
MapQA: A Dataset for Question Answering on Choropleth Maps
Shuaichen Chang
David Palzer
Jialin Li
Eric Fosler-Lussier
N. Xiao
19
40
0
15 Nov 2022
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Yang Trista Cao
Kyle Seelman
Kyungjun Lee
Hal Daumé
28
5
0
26 Oct 2022
1
2
3
4
5
6
7
Next