Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,890 papers shown
Title
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
154
560
0
27 Feb 2017
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
Yikang Li
Wanli Ouyang
Xiaogang Wang
Xiaoóu Tang
ObjD
30
48
0
23 Feb 2017
Task-driven Visual Saliency and Attention-based Visual Question Answering
Yuetan Lin
Zhangyang Pang
Donghui Wang
Yueting Zhuang
35
26
0
22 Feb 2017
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
30
386
0
19 Feb 2017
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
33
371
0
07 Feb 2017
Living a discrete life in a continuous world: Reference with distributed representations
Gemma Boleda
Sebastian Padó
N. Pham
Marco Baroni
8
0
0
06 Feb 2017
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
N. Mostafazadeh
Chris Brockett
W. Dolan
Michel Galley
Jianfeng Gao
Georgios P. Spithourakis
Lucy Vanderwende
26
181
0
28 Jan 2017
Context-aware Captions from Context-agnostic Supervision
Ramakrishna Vedantam
Samy Bengio
Kevin Patrick Murphy
Devi Parikh
Gal Chechik
22
152
0
11 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
46
273
0
30 Dec 2016
Learning Visual N-Grams from Web Data
Ang Li
Allan Jabri
Armand Joulin
Laurens van der Maaten
VLM
20
136
0
29 Dec 2016
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task
Nan Ding
Sebastian Goodman
Fei Sha
Radu Soricut
VLM
27
9
0
22 Dec 2016
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
53
2,322
0
20 Dec 2016
Automatic Generation of Grounded Visual Questions
Shijie Zhang
Lizhen Qu
Shaodi You
Zhenglu Yang
Jiawan Zhang
OOD
27
79
0
20 Dec 2016
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
Anton Van Den Hengel
OOD
39
86
0
16 Dec 2016
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
AAML
24
79
0
14 Dec 2016
Learning to Hash-tag Videos with Tag2Vec
A. Singh
Saurabh Saini
R. Shah
P. J. Narayanan
27
1
0
13 Dec 2016
VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
Marc Bolaños
Álvaro Peris
F. Casacuberta
Petia Radeva
32
6
0
12 Dec 2016
MarioQA: Answering Questions by Watching Gameplay Videos
Jonghwan Mun
Paul Hongsuck Seo
Ilchae Jung
Bohyung Han
50
108
0
06 Dec 2016
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
44
165
0
05 Dec 2016
Deep Multi-Modal Image Correspondence Learning
Chen Liu
Jiajun Wu
Pushmeet Kohli
Yasutaka Furukawa
13
5
0
05 Dec 2016
Who is Mistaken?
Benjamin Eysenbach
Carl Vondrick
Antonio Torralba
35
15
0
04 Dec 2016
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Mark Yatskar
Vicente Ordonez
Luke Zettlemoyer
Ali Farhadi
VLM
17
42
0
03 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
146
3,130
0
02 Dec 2016
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
69
990
0
26 Nov 2016
GuessWhat?! Visual object discovery through multi-modal dialogue
H. D. Vries
Florian Strub
A. Chandar
Olivier Pietquin
Hugo Larochelle
Aaron Courville
VLM
50
427
0
23 Nov 2016
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
Tegan Maharaj
Nicolas Ballas
Anna Rohrbach
Aaron Courville
C. Pal
VGen
15
107
0
23 Nov 2016
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
30
169
0
21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
J. Hockenmaier
Svetlana Lazebnik
33
189
0
21 Nov 2016
Recurrent Memory Addressing for describing videos
A. Jain
Abhinav Agarwalla
Kumar Krishna Agrawal
Pabitra Mitra
38
10
0
20 Nov 2016
Answering Image Riddles using Vision and Reasoning through Probabilistic Soft Logic
Somak Aditya
Yezhou Yang
Chitta Baral
Yiannis Aloimonos
ReLM
14
4
0
17 Nov 2016
Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAtt
17
63
0
17 Nov 2016
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
Long Chen
Hanwang Zhang
Jun Xiao
Liqiang Nie
Jian Shao
Wei Liu
Tat-Seng Chua
27
1,650
0
17 Nov 2016
Zero-Shot Visual Question Answering
Damien Teney
Anton Van Den Hengel
29
73
0
17 Nov 2016
The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives
Mohit Iyyer
Varun Manjunatha
Anupam Guha
Yogarshi Vyas
Jordan L. Boyd-Graber
Hal Daumé
L. Davis
30
95
0
16 Nov 2016
Leveraging Video Descriptions to Learn Video Question Answering
Kuo-Hao Zeng
Tseng-Hung Chen
Ching-Yao Chuang
Yuan-Hong Liao
Juan Carlos Niebles
Min Sun
32
175
0
12 Nov 2016
Crowdsourcing in Computer Vision
Adriana Kovashka
Olga Russakovsky
Li Fei-Fei
Kristen Grauman
HAI
VLM
3DV
49
149
0
07 Nov 2016
Dynamic Coattention Networks For Question Answering
Caiming Xiong
Victor Zhong
R. Socher
AIMat
40
684
0
05 Nov 2016
Bidirectional Attention Flow for Machine Comprehension
Minjoon Seo
Aniruddha Kembhavi
Ali Farhadi
Hannaneh Hajishirzi
65
2,087
0
05 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
39
664
0
02 Nov 2016
End-to-end Learning of Deep Visual Representations for Image Retrieval
Albert Gordo
Jon Almazán
Jérôme Revaud
Diane Larlus
VLM
30
536
0
25 Oct 2016
Proposing Plausible Answers for Open-ended Visual Question Answering
Omid Bakhshandeh
Trung Bui
Zhe-nan Lin
W. Chang
29
1
0
20 Oct 2016
Deep Identity-aware Transfer of Facial Attributes
Mu Li
W. Zuo
David C. Zhang
CVBM
35
149
0
18 Oct 2016
Video Fill in the Blank with Merging LSTMs
Amir Mazaheri
Dong-Ming Zhang
M. Shah
32
18
0
13 Oct 2016
Open-Ended Visual Question-Answering
Issey Masuda
Santiago Pascual de la Puente
Xavier Giró-i-Nieto
28
9
0
09 Oct 2016
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
Ashwin K. Vijayakumar
Michael Cogswell
Ramprasaath R. Selvaraju
Q. Sun
Stefan Lee
David J. Crandall
Dhruv Batra
28
542
0
07 Oct 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
68
19,607
0
07 Oct 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle
Christopher Kanan
OOD
33
235
0
05 Oct 2016
A Survey of Multi-View Representation Learning
Yingming Li
Ming Yang
Zhongfei Zhang
AI4TS
3DV
37
509
0
03 Oct 2016
Contextual RNN-GANs for Abstract Reasoning Diagram Generation
Arna Ghosh
Viveka Kulharia
A. Mukerjee
Vinay P. Namboodiri
Joey Tianyi Zhou
GAN
33
37
0
29 Sep 2016
Learning Language-Visual Embedding for Movie Understanding with Natural-Language
Atousa Torabi
Niket Tandon
Leonid Sigal
22
97
0
26 Sep 2016
Previous
1
2
3
...
55
56
57
58
Next