Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization
Jia-Hong Huang
L. Murn
M. Mrak
Marcel Worring
ViT
152
37
0
26 Apr 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
148
56
0
23 Apr 2021
Towards Solving Multimodal Comprehension
Pritish Sahu
Karan Sikka
Ajay Divakaran
31
2
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
92
24
0
20 Apr 2021
Open Challenges on Generating Referring Expressions for Human-Robot Interaction
Fethiye Irmak Dogan
Iolanda Leite
91
4
0
19 Apr 2021
SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
Satwik Kottur
Seungwhan Moon
A. Geramifard
Babak Damavandi
86
92
0
18 Apr 2021
Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models
Tejas Srinivasan
Yonatan Bisk
VLM
83
56
0
18 Apr 2021
Question Decomposition with Dependency Graphs
Matan Hasson
Jonathan Berant
GNN
91
10
0
17 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
111
348
0
17 Apr 2021
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Taichi Iki
Akiko Aizawa
VLM
67
20
0
16 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
83
19
0
16 Apr 2021
Ensemble of MRR and NDCG models for Visual Dialog
Idan Schwartz
58
9
0
15 Apr 2021
Do Neural Network Weights account for Classes Centers?
Ioannis Kansizoglou
Loukas Bampis
Antonios Gasteratos
40
10
0
14 Apr 2021
Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention
Leon Bergen
Dzmitry Bahdanau
Timothy J. O'Donnell
FedML
29
1
0
14 Apr 2021
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata
Ian Berlot-Attwell
27
3
0
13 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
102
163
0
13 Apr 2021
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
Shailaja Keyur Sampat
Akshay Kumar
Yezhou Yang
Chitta Baral
78
26
0
13 Apr 2021
Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
Jae-Won Cho
Dong-Jin Kim
Jinsoo Choi
Yunjae Jung
In So Kweon
VLM
57
17
0
13 Apr 2021
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
Roshanak Mirzaee
Hossein Rajaby Faghihi
Qiang Ning
Parisa Kordjmashidi
56
83
0
12 Apr 2021
Towards a Collective Agenda on AI for Earth Science Data Analysis
D. Tuia
R. Roscher
Jan Dirk Wegner
Nathan Jacobs
Xiaoxiang Zhu
Gustau Camps-Valls
AI4CE
84
70
0
11 Apr 2021
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework
Santiago Castro
Ruoyao Wang
Pingxuan Huang
Ian Stewart
Oana Ignat
Nan Liu
Jonathan C. Stroud
Rada Mihalcea
AIMat
89
11
0
09 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
84
69
0
09 Apr 2021
Exploiting Natural Language for Efficient Risk-Aware Multi-robot SaR Planning
Vikram Shree
B. Asfora
Rachel Zheng
Samantha Hong
Jacopo Banfi
M. Campbell
46
10
0
08 Apr 2021
How Transferable are Reasoning Patterns in VQA?
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
63
28
0
08 Apr 2021
Multimodal Entity Linking for Tweets
Omar Adjali
Romaric Besançon
Olivier Ferret
Hervé Le Borgne
Brigitte Grau
71
49
0
07 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
94
78
0
07 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
160
274
0
07 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
120
99
0
05 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
68
26
0
02 Apr 2021
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
103
53
0
01 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
227
1,194
0
01 Apr 2021
Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation
Aviral Joshi
Chengzhi Huang
H. Singh
54
2
0
31 Mar 2021
Multi-Class Multi-Instance Count Conditioned Adversarial Image Generation
Amrutha Saseendran
Kathrin Skubch
Margret Keuper
VLM
GAN
45
2
0
31 Mar 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
128
198
0
31 Mar 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
99
121
0
30 Mar 2021
Grounding Open-Domain Instructions to Automate Web Support Tasks
N. Xu
Sam Masling
Michael Du
Giovanni Campagna
Larry Heck
James A. Landay
M. Lam
LLMAG
AI4TS
73
44
0
30 Mar 2021
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
Madeleine Grunde-McLaughlin
Ranjay Krishna
Maneesh Agrawala
CoGe
85
119
0
30 Mar 2021
Domain-robust VQA with diverse datasets and methods but no target labels
Ruotong Wang
Tristan D. Maidment
Ahmad Diab
Adriana Kovashka
R. Hwa
OOD
129
23
0
29 Mar 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
Hila Chefer
Shir Gur
Lior Wolf
ViT
103
328
0
29 Mar 2021
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks
Man Luo
Shailaja Keyur Sampat
Riley Tallman
Yankai Zeng
Manuha Vancha
Akarshan Sajja
Chitta Baral
53
10
0
28 Mar 2021
Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models
Arijit Ray
Michael Cogswell
Xiaoyu Lin
Kamran Alipour
Ajay Divakaran
Yi Yao
Giedrius Burachas
FAtt
40
4
0
26 Mar 2021
On the hidden treasure of dialog in video question answering
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
76
12
0
26 Mar 2021
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
Fangqiu Yi
Baoxiong Jia
Song-Chun Zhu
Yixin Zhu
132
72
0
26 Mar 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
68
26
0
24 Mar 2021
Scene-Intuitive Agent for Remote Embodied Visual Grounding
Xiangru Lin
Guanbin Li
Yizhou Yu
LM&Ro
80
53
0
24 Mar 2021
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
60
16
0
23 Mar 2021
Multi-Modal Answer Validation for Knowledge-Based VQA
Jialin Wu
Jiasen Lu
Ashish Sabharwal
Roozbeh Mottaghi
171
146
0
23 Mar 2021
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo
Hamish Ivison
S. Han
Josiah Poon
MILM
120
52
0
20 Mar 2021
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
Chen Liang
Yu Wu
Yawei Luo
Yi Yang
VOS
101
30
0
19 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
64
5
0
18 Mar 2021
Previous
1
2
3
...
36
37
38
...
58
59
60
Next