Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,890 papers shown
Title
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
15
2,868
0
26 May 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning
Q. Sun
Stefan Lee
Dhruv Batra
BDL
33
43
0
24 May 2017
Learning Convolutional Text Representations for Visual Question Answering
Zhengyang Wang
Shuiwang Ji
FAtt
19
15
0
18 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
67
578
0
18 May 2017
ParlAI: A Dialog Research Software Platform
Alexander H. Miller
Will Feng
Adam Fisch
Jiasen Lu
Dhruv Batra
Antoine Bordes
Devi Parikh
Jason Weston
40
373
0
18 May 2017
Object-Level Context Modeling For Scene Classification with Context-CNN
Syed Ashar Javed
A. Nelakanti
VLM
32
10
0
11 May 2017
Survey of Visual Question Answering: Datasets and Techniques
A. Gupta
21
38
0
10 May 2017
Inferring and Executing Programs for Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
NAI
35
541
0
10 May 2017
Combating Human Trafficking with Deep Multimodal Models
Edmund Tong
Amir Zadeh
Cara Jones
Louis-Philippe Morency
24
51
0
08 May 2017
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau
Douwe Kiela
Holger Schwenk
Loïc Barrault
Antoine Bordes
AI4TS
SSL
70
2,097
0
05 May 2017
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
Xiaosong Wang
Yifan Peng
Le Lu
Zhiyong Lu
M. Bagheri
Ronald M. Summers
LM&MA
72
2,474
0
05 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
35
139
0
03 May 2017
FOIL it! Find One mismatch between Image and Language caption
Ravi Shekhar
Sandro Pezzelle
Yauhen Klimovich
Aurélie Herbelot
Moin Nabi
E. Sangineto
Raffaella Bernardi
25
137
0
03 May 2017
The Forgettable-Watcher Model for Video Question Answering
Hongyang Xue
Zhou Zhao
Deng Cai
21
9
0
03 May 2017
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
Tseng-Hung Chen
Yuan-Hong Liao
Ching-Yao Chuang
W. Hsu
Jianlong Fu
Min Sun
31
141
0
02 May 2017
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Aroma Mahendru
Viraj Prabhu
Akrit Mohapatra
Dhruv Batra
Stefan Lee
NAI
37
38
0
01 May 2017
Speech-Based Visual Question Answering
Ted Zhang
Dengxin Dai
Tinne Tuytelaars
Marie-Francine Moens
Luc Van Gool
40
24
0
01 May 2017
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Dipendra Kumar Misra
John Langford
Yoav Artzi
21
247
0
28 Apr 2017
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
Aishwarya Agrawal
Aniruddha Kembhavi
Dhruv Batra
Devi Parikh
CoGe
26
80
0
26 Apr 2017
Paying Attention to Descriptions Generated by Image Captioning Models
Hamed R. Tavakoli
Rakshith Shetty
Ali Borji
Jorma T. Laaksonen
29
79
0
24 Apr 2017
Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition
Hamed R. Tavakoli
Jorma T. Laaksonen
16
1
0
24 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
24
11
0
24 Apr 2017
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Wei-Lun Chao
Hexiang Hu
Fei Sha
22
37
0
24 Apr 2017
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
KELM
GNN
ReLM
LRM
42
574
0
18 Apr 2017
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching
David Novotny
Diane Larlus
Andrea Vedaldi
3DPC
33
65
0
16 Apr 2017
Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions
Amir Mazaheri
Dong-Ming Zhang
M. Shah
17
12
0
15 Apr 2017
ShapeWorld - A new test methodology for multimodal language understanding
A. Kuhnle
Ann A. Copestake
36
66
0
14 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
34
547
0
14 Apr 2017
Spatial Memory for Context Reasoning in Object Detection
Xinlei Chen
Abhinav Gupta
ObjD
25
164
0
13 Apr 2017
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks
Devinder Kumar
Alexander Wong
Graham W. Taylor
31
59
0
13 Apr 2017
Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries
Y. Zhang
Luyao Yuan
Yijie Guo
Zhiyuan He
I-An Huang
Honglak Lee
ObjD
28
57
0
12 Apr 2017
What's in a Question: Using Visual Questions as a Form of Supervision
Siddha Ganju
Olga Russakovsky
Abhinav Gupta
19
16
0
12 Apr 2017
Creativity: Generating Diverse Questions using Variational Autoencoders
Unnat Jain
Ziyu Zhang
Alex Schwing
25
152
0
11 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
V. Kazemi
Ali Elqursh
OOD
28
184
0
11 Apr 2017
Pay Attention to Those Sets! Learning Quantification from Images
Ionut-Teodor Sorodoc
Sandro Pezzelle
Aurélie Herbelot
Mariella Dimiccoli
Raffaella Bernardi
6
0
0
10 Apr 2017
An Empirical Evaluation of Visual Question Answering for Novel Objects
Santhosh Kumar Ramakrishnan
Ambar Pal
Gaurav Sharma
Anurag Mittal
OOD
24
32
0
08 Apr 2017
It Takes Two to Tango: Towards Theory of AI's Mind
Arjun Chandrasekaran
Deshraj Yadav
Prithvijit Chattopadhyay
Viraj Prabhu
Devi Parikh
41
54
0
03 Apr 2017
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Tanmay Gupta
Kevin J. Shih
Saurabh Singh
Derek Hoiem
37
26
0
02 Apr 2017
Towards Building Large Scale Multimodal Domain-Aware Conversation Systems
Amrita Saha
Mitesh Khapra
Karthik Sankaranarayanan
29
8
0
01 Apr 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
27
810
0
29 Mar 2017
A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment
Haonan Yu
Haichao Zhang
Wenyuan Xu
LM&Ro
11
25
0
28 Mar 2017
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
30
231
0
28 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe-nan Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
36
234
0
23 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
31
424
0
20 Mar 2017
VQABQ: Visual Question Answering by Basic Questions
Jia-Hong Huang
Modar Alfadly
Guohao Li
27
24
0
19 Mar 2017
Recurrent Models for Situation Recognition
Arun Mallya
Svetlana Lazebnik
20
30
0
18 Mar 2017
End-to-end optimization of goal-driven and visually grounded dialogue systems
Florian Strub
H. D. Vries
Jérémie Mary
Bilal Piot
Aaron Courville
Olivier Pietquin
OffRL
30
138
0
15 Mar 2017
Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection
Xiaodan Liang
Lisa Lee
Eric Xing
29
250
0
08 Mar 2017
Asymmetric Tri-training for Unsupervised Domain Adaptation
Kuniaki Saito
Yoshitaka Ushiku
Tatsuya Harada
49
583
0
27 Feb 2017
Previous
1
2
3
...
54
55
56
57
58
Next