Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering
Medhini Narasimhan
Svetlana Lazebnik
Alex Schwing
NAI
GNN
ReLM
69
11
0
01 Nov 2018
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
120
610
0
01 Nov 2018
How2: A Large-scale Dataset for Multimodal Language Understanding
Ramon Sanabria
Ozan Caglayan
Shruti Palaskar
Desmond Elliott
Loïc Barrault
Lucia Specia
Florian Metze
VGen
MLLM
107
292
0
01 Nov 2018
TallyQA: Answering Complex Counting Questions
Manoj Acharya
Kushal Kafle
Christopher Kanan
71
125
0
29 Oct 2018
Do Explanations make VQA Models more Predictable to a Human?
Arjun Chandrasekaran
Viraj Prabhu
Deshraj Yadav
Prithvijit Chattopadhyay
Devi Parikh
FAtt
150
97
0
29 Oct 2018
Middle-Out Decoding
Shikib Mehri
Leonid Sigal
68
22
0
28 Oct 2018
Fabrik: An Online Collaborative Neural Network Editor
Utsav Garg
Viraj Prabhu
Deshraj Yadav
Ram Ramrakhya
Harsh Agrawal
Dhruv Batra
GNN
65
4
0
27 Oct 2018
Engaging Image Captioning Via Personality
Kurt Shuster
Samuel Humeau
Hexiang Hu
Antoine Bordes
Jason Weston
87
152
0
25 Oct 2018
Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures
B. Vatashsky
S. Ullman
CoGe
72
1
0
25 Oct 2018
Improving Context Modelling in Multimodal Dialogue Generation
Shubham Agarwal
Ondrej Dusek
Ioannis Konstas
Verena Rieser
71
19
0
20 Oct 2018
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Shubham Agarwal
Ondrej Dusek
Ioannis Konstas
Verena Rieser
76
22
0
20 Oct 2018
Cross-Modal and Hierarchical Modeling of Video and Text
Bowen Zhang
Hexiang Hu
Fei Sha
BDL
AI4TS
84
191
0
16 Oct 2018
Learning to Globally Edit Images with Textual Description
Hai Wang
Jason D. Williams
Sin-Han Kang
DiffM
75
18
0
13 Oct 2018
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
S. Ramakrishnan
Aishwarya Agrawal
Stefan Lee
AAML
72
239
0
08 Oct 2018
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Kexin Yi
Jiajun Wu
Chuang Gan
Antonio Torralba
Pushmeet Kohli
J. Tenenbaum
NAI
121
614
0
04 Oct 2018
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Hyeonwoo Noh
Taehoon Kim
Jonghwan Mun
Bohyung Han
86
17
0
03 Oct 2018
Image as Data: Automated Visual Content Analysis for Political Science
Jungseock Joo
Zachary C. Steinert-Threlkeld
48
42
0
03 Oct 2018
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
Jianwei Yang
Jiasen Lu
Stefan Lee
Dhruv Batra
Devi Parikh
103
42
0
01 Oct 2018
Learning Robust, Transferable Sentence Representations for Text Classification
Wasi Uddin Ahmad
Xueying Bai
Nanyun Peng
Kai-Wei Chang
AI4TS
OOD
61
5
0
28 Sep 2018
A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC
Mark Yatskar
93
97
0
27 Sep 2018
Textually Enriched Neural Module Networks for Visual Question Answering
Khyathi Chandu
Mary Arpita Pyreddy
Matthieu Felix
N. Joshi
56
6
0
23 Sep 2018
Multimodal Dual Attention Memory for Video Story Question Answering
Kyung-Min Kim
Seongho Choi
Jin-Hwa Kim
Byoung-Tak Zhang
77
77
0
21 Sep 2018
Lessons learned in multilingual grounded language learning
Ákos Kádár
Desmond Elliott
Marc-Alexandre Côté
Grzegorz Chrupała
Afra Alishahi
VLM
112
24
0
20 Sep 2018
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description
Oliver A. Nina
Washington Garcia
Scott Clouse
Alper Yilmaz
30
4
0
19 Sep 2018
LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts
Shuming Ma
Lei Cui
Damai Dai
Furu Wei
Xu Sun
VGen
72
63
0
13 Sep 2018
The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA
Shailza Jolly
Sandro Pezzelle
T. Klein
Andreas Dengel
Moin Nabi
39
2
0
12 Sep 2018
Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances
Thao Le Minh
N. Shimizu
Takashi Miyazaki
Koichi Shinoda
32
13
0
12 Sep 2018
Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions
M. Wagner
H. Basevi
Rakshith Shetty
Wenbin Li
Mateusz Malinowski
M. Fritz
A. Leonardis
68
29
0
11 Sep 2018
The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR
Mateusz Malinowski
Carl Doersch
ReLM
65
12
0
11 Sep 2018
Context-Dependent Diffusion Network for Visual Relationship Detection
Zhen Cui
Chunyan Xu
Wenming Zheng
Jian Yang
GNN
79
50
0
11 Sep 2018
How clever is the FiLM model, and how clever can it be?
A. Kuhnle
Huiyuan Xie
Ann A. Copestake
68
6
0
09 Sep 2018
Faithful Multimodal Explanation for Visual Question Answering
Jialin Wu
Raymond J. Mooney
85
91
0
08 Sep 2018
Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge
Steven Derby
Paul Miller
B. Murphy
Barry Devereux
36
15
0
07 Sep 2018
Cascaded Mutual Modulation for Visual Reasoning
Yiqun Yao
Jiaming Xu
Feng Wang
Bo Xu
LRM
60
14
0
06 Sep 2018
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
77
165
0
06 Sep 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
72
56
0
06 Sep 2018
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
116
643
0
05 Sep 2018
Retinal Vessel Segmentation under Extreme Low Annotation: A Generative Adversarial Network Approach
A. Lahiri
V. Jain
Arnab Kumar Mondal
P. Biswas
GAN
MedIm
73
12
0
05 Sep 2018
Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering
Medhini Narasimhan
Alex Schwing
79
105
0
04 Sep 2018
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes
Semih Yagcioglu
Aykut Erdem
Erkut Erdem
Nazli Ikizler-Cinbis
CoGe
64
173
0
04 Sep 2018
Diverse and Coherent Paragraph Generation from Images
Moitreya Chatterjee
Alex Schwing
75
67
0
03 Sep 2018
Learning to Describe Differences Between Pairs of Similar Images
Harsh Jhamtani
Taylor Berg-Kirkpatrick
90
155
0
31 Aug 2018
Towards a Better Metric for Evaluating Question Generation Systems
Preksha Nema
Mitesh M. Khapra
95
108
0
30 Aug 2018
Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Avikalp Srivastava
Hsin Wen Liu
Sumio Fujita
30
3
0
29 Aug 2018
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering
Jianmo Ni
Chenguang Zhu
Weizhu Chen
Julian McAuley
RALM
89
38
0
28 Aug 2018
Convolutional Neural Networks for Aerial Vehicle Detection and Recognition
Amir Soleimani
Nasser M. Nasrabadi
E. Griffith
J. Ralph
Simon Maskell
28
10
0
26 Aug 2018
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers
Dongxiang Zhang
Lei Wang
Nuo Xu
B. Dai
Heng Tao Shen
ReLM
AIMat
98
127
0
22 Aug 2018
CoQA: A Conversational Question Answering Challenge
Siva Reddy
Danqi Chen
Christopher D. Manning
RALM
HAI
158
1,213
0
21 Aug 2018
Auto-Classification of Retinal Diseases in the Limit of Sparse Data Using a Two-Streams Machine Learning Model
Chao-Han Huck Yang
Fangyu Liu
Jia-Hong Huang
Meng Tian
Hiromasa Morikawa
I-Hung Lin
Yi-Chieh Liu
Hao-Hsiang Yang
Jesper N. Tegnér
81
18
0
16 Aug 2018
Context-Aware Visual Policy Network for Sequence-Level Image Captioning
Daqing Liu
Zhengjun Zha
Hanwang Zhang
Yongdong Zhang
Feng Wu
CLIP
103
104
0
16 Aug 2018
Previous
1
2
3
...
50
51
52
...
58
59
60
Next