Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Min Zhang
Louis-Philippe Morency
92
1,238
0
23 Jul 2017
Inspiring Computer Vision System Solutions
J. Zilly
A. Boyarski
Micael Carvalho
Amir Atapour-Abarghouei
Konstantinos Amplianitis
...
Massimiliano Mancini
Hernán Gonzalez
Riccardo Spezialetti
Carlos Sampedro Pérez
Hao Li
18
1
0
22 Jul 2017
Video Question Answering via Attribute-Augmented Attention Network Learning
Yunan Ye
Zhou Zhao
Yimeng Li
Long Chen
Jun Xiao
Yueting Zhuang
80
109
0
20 Jul 2017
Visual Question Answering with Memory-Augmented Networks
Chao Ma
Chunhua Shen
A. Dick
Qi Wu
Peng Wang
Anton Van Den Hengel
Ian Reid
90
100
0
17 Jul 2017
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach
Aidean Sharghi
Jacob S. Laurel
Boqing Gong
EgoV
122
137
0
16 Jul 2017
Automatic Understanding of Image and Video Advertisements
Zaeem Hussain
Ruotong Wang
Xiaozhong Zhang
Keren Ye
Christopher Thomas
Zuha Agha
Nathan Ong
Adriana Kovashka
DiffM
69
166
0
10 Jul 2017
Learning Visual Reasoning Without Strong Priors
Ethan Perez
H. D. Vries
Florian Strub
Vincent Dumoulin
Aaron Courville
OOD
NAI
108
62
0
10 Jul 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
97
175
0
04 Jul 2017
Modulating early visual processing by language
H. D. Vries
Florian Strub
Jérémie Mary
Hugo Larochelle
Olivier Pietquin
Aaron Courville
192
489
0
02 Jul 2017
Compact Tensor Pooling for Visual Question Answering
Yang Shi
Tommaso Furlanello
Anima Anandkumar
16
0
0
20 Jun 2017
Identifying Spatial Relations in Images using Convolutional Neural Networks
Mandar Haldekar
Ashwinkumar Ganesan
Tim Oates
44
39
0
13 Jun 2017
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network
Xiao Yang
Ersin Yumer
P. Asente
Mike Kraley
Daniel Kifer
C. Lee Giles
78
230
0
07 Jun 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu
A. Kannan
Jianwei Yang
Devi Parikh
Dhruv Batra
BDL
102
137
0
05 Jun 2017
A simple neural network module for relational reasoning
Adam Santoro
David Raposo
David Barrett
Mateusz Malinowski
Razvan Pascanu
Peter W. Battaglia
Timothy Lillicrap
GNN
NAI
191
1,617
0
05 Jun 2017
Deep learning evaluation using deep linguistic processing
A. Kuhnle
Ann A. Copestake
ELM
59
11
0
05 Jun 2017
Listen, Interact and Talk: Learning to Speak via Interaction
Haichao Zhang
Haonan Yu
Wenyuan Xu
77
13
0
28 May 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
175
2,963
0
26 May 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning
Q. Sun
Stefan Lee
Dhruv Batra
BDL
84
44
0
24 May 2017
Learning Convolutional Text Representations for Visual Question Answering
Zhengyang Wang
Shuiwang Ji
FAtt
71
15
0
18 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
171
584
0
18 May 2017
ParlAI: A Dialog Research Software Platform
Alexander H. Miller
Will Feng
Adam Fisch
Jiasen Lu
Dhruv Batra
Antoine Bordes
Devi Parikh
Jason Weston
128
376
0
18 May 2017
Object-Level Context Modeling For Scene Classification with Context-CNN
Syed Ashar Javed
A. Nelakanti
VLM
81
10
0
11 May 2017
Survey of Visual Question Answering: Datasets and Techniques
A. Gupta
50
38
0
10 May 2017
Inferring and Executing Programs for Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
NAI
122
545
0
10 May 2017
Combating Human Trafficking with Deep Multimodal Models
Edmund Tong
Amir Zadeh
Cara Jones
Louis-Philippe Morency
82
51
0
08 May 2017
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau
Douwe Kiela
Holger Schwenk
Loïc Barrault
Antoine Bordes
AI4TS
SSL
254
2,106
0
05 May 2017
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
Xiaosong Wang
Yifan Peng
Le Lu
Zhiyong Lu
M. Bagheri
Ronald M. Summers
LM&MA
261
2,558
0
05 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
87
139
0
03 May 2017
FOIL it! Find One mismatch between Image and Language caption
Ravi Shekhar
Sandro Pezzelle
Yauhen Klimovich
Aurélie Herbelot
Moin Nabi
E. Sangineto
Raffaella Bernardi
65
141
0
03 May 2017
The Forgettable-Watcher Model for Video Question Answering
Hongyang Xue
Zhou Zhao
Deng Cai
43
9
0
03 May 2017
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
Tseng-Hung Chen
Yuan-Hong Liao
Ching-Yao Chuang
W. Hsu
Jianlong Fu
Min Sun
105
142
0
02 May 2017
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Aroma Mahendru
Viraj Prabhu
Akrit Mohapatra
Dhruv Batra
Stefan Lee
NAI
108
38
0
01 May 2017
Speech-Based Visual Question Answering
Ted Zhang
Dengxin Dai
Tinne Tuytelaars
Marie-Francine Moens
Luc Van Gool
85
25
0
01 May 2017
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Dipendra Kumar Misra
John Langford
Yoav Artzi
86
247
0
28 Apr 2017
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
Aishwarya Agrawal
Aniruddha Kembhavi
Dhruv Batra
Devi Parikh
CoGe
70
80
0
26 Apr 2017
Paying Attention to Descriptions Generated by Image Captioning Models
Hamed R. Tavakoli
Rakshith Shetty
Ali Borji
Jorma T. Laaksonen
80
79
0
24 Apr 2017
Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition
Hamed R. Tavakoli
Jorma T. Laaksonen
40
1
0
24 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
48
11
0
24 Apr 2017
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Wei-Lun Chao
Hexiang Hu
Fei Sha
89
37
0
24 Apr 2017
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
KELM
GNN
ReLM
LRM
142
581
0
18 Apr 2017
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching
David Novotny
Diane Larlus
Andrea Vedaldi
3DPC
98
66
0
16 Apr 2017
Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions
Amir Mazaheri
Dong Zhang
M. Shah
56
12
0
15 Apr 2017
ShapeWorld - A new test methodology for multimodal language understanding
A. Kuhnle
Ann A. Copestake
65
69
0
14 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
93
562
0
14 Apr 2017
Spatial Memory for Context Reasoning in Object Detection
Xinlei Chen
Abhinav Gupta
ObjD
101
166
0
13 Apr 2017
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks
Devinder Kumar
Alexander Wong
Graham W. Taylor
82
61
0
13 Apr 2017
Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries
Y. Zhang
Luyao Yuan
Yijie Guo
Zhiyuan He
I-An Huang
Honglak Lee
ObjD
92
57
0
12 Apr 2017
What's in a Question: Using Visual Questions as a Form of Supervision
Siddha Ganju
Olga Russakovsky
Abhinav Gupta
78
16
0
12 Apr 2017
Creativity: Generating Diverse Questions using Variational Autoencoders
Unnat Jain
Ziyu Zhang
Alex Schwing
72
152
0
11 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
110
498
0
11 Apr 2017
Previous
1
2
3
...
55
56
57
58
59
60
Next