Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
Kexin Yi
Yuta Saito
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
J. Tenenbaum
NAI
170
475
0
03 Oct 2019
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks
Mehdi Neshat
Zifan Wang
Bradley Alexander
Fan Yang
Zijian Zhang
Sirui Ding
Markus Wagner
Helen Zhou
FAtt
114
1,078
0
03 Oct 2019
Embodied Language Grounding with 3D Visual Feature Representations
Mihir Prabhudesai
H. Tung
Syed Ashar Javed
Maximilian Sieb
Adam W. Harley
Katerina Fragkiadaki
113
21
0
02 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
138
25
0
30 Sep 2019
On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
NAI
105
9
0
30 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
62
59
0
26 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
134
449
0
25 Sep 2019
Synthetic Data for Deep Learning
Sergey I. Nikolenko
149
358
0
25 Sep 2019
Question Answering is a Format; When is it Useful?
Matt Gardner
Jonathan Berant
Hannaneh Hajishirzi
Alon Talmor
Sewon Min
63
52
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
365
948
0
24 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
57
13
0
23 Sep 2019
Look, Read and Enrich. Learning from Scientific Figures and their Captions
José Manuél Gómez-Pérez
Raúl Ortega
34
11
0
19 Sep 2019
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
67
33
0
18 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
35
1
0
17 Sep 2019
Scene Graph Parsing by Attention Graph
Martin Andrews
Yew Ken Chia
Sam Witteveen
GNN
48
12
0
13 Sep 2019
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
141
13
0
11 Sep 2019
Reasoning About Human-Object Interactions Through Dual Attention Networks
Tete Xiao
Quanfu Fan
Dan Gutfreund
Mathew Monfort
A. Oliva
Bolei Zhou
56
35
0
10 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
83
65
0
10 Sep 2019
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
Christopher Clark
Mark Yatskar
Luke Zettlemoyer
OOD
125
468
0
09 Sep 2019
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
82
40
0
08 Sep 2019
Conditional Text Generation for Harmonious Human-Machine Interaction
Bin Guo
Hao Wang
Yasan Ding
Wei Wu
Shaoyang Hao
Yueqi Sun
Zhiwen Yu
103
4
0
08 Sep 2019
Abductive Reasoning as Self-Supervision for Common Sense Question Answering
Sathyanarayanan N. Aakur
Sudeep Sarkar
LRM
SSL
OOD
58
4
0
06 Sep 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
156
248
0
06 Sep 2019
A Better Way to Attend: Attention with Trees for Video Question Answering
Hongyang Xue
Wenqing Chu
Zhou Zhao
Deng Cai
57
33
0
05 Sep 2019
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering
Soravit Changpinyo
Bo Pang
Piyush Sharma
Radu Soricut
ObjD
60
20
0
04 Sep 2019
Do Cross Modal Systems Leverage Semantic Relationships?
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
Faisal Shafait
56
8
0
03 Sep 2019
PlotQA: Reasoning over Scientific Plots
Nitesh Methani
Pritha Ganguly
Mitesh M. Khapra
Pratyush Kumar
131
235
0
03 Sep 2019
What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues
Xintong Yu
Hongming Zhang
Yangqiu Song
Yan Song
Changshui Zhang
42
28
0
01 Sep 2019
Phrase Grounding by Soft-Label Chain Conditional Random Field
Jiacheng Liu
Julia Hockenmaier
45
10
0
01 Sep 2019
Generating Personalized Recipes from Historical User Preferences
Bodhisattwa Prasad Majumder
Shuyang Li
Jianmo Ni
Julian McAuley
76
116
0
31 Aug 2019
Aesthetic Image Captioning From Weakly-Labelled Photographs
Koustav Ghosal
A. Rana
A. Smolic
67
25
0
29 Aug 2019
Grounded Agreement Games: Emphasizing Conversational Grounding in Visual Dialogue Settings
David Schlangen
63
16
0
29 Aug 2019
Language Tasks and Language Games: On Methodology in Current Natural Language Processing Research
David Schlangen
73
19
0
28 Aug 2019
Adversarial Representation Learning for Text-to-Image Matching
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
122
188
0
28 Aug 2019
Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts
Sandro Pezzelle
Raquel Fernández
VLM
58
18
0
27 Aug 2019
Visual Question Answering using Deep Learning: A Survey and Performance Analysis
Yash Srivastava
Vaishnav Murali
S. Dubey
Snehasis Mukherjee
87
49
0
27 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
76
103
0
25 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
318
1,672
0
22 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta
Alex Schwing
Derek Hoiem
65
25
0
22 Aug 2019
Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation
Kuan-Yen Lin
Chao-Chun Hsu
Yun-Nung Chen
Lun-Wei Ku
VGen
60
20
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
254
2,498
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
102
159
0
20 Aug 2019
What is needed for simple spatial language capabilities in VQA?
A. Kuhnle
Ann A. Copestake
CoGe
35
1
0
17 Aug 2019
Language Features Matter: Effective Language Representations for Vision-Language Tasks
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
58
27
0
17 Aug 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAtt
UQCV
116
76
0
17 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
304
907
0
16 Aug 2019
PHYRE: A New Benchmark for Physical Reasoning
A. Bakhtin
Laurens van der Maaten
Justin Johnson
Laura Gustafson
Ross B. Girshick
LRM
71
129
0
15 Aug 2019
MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences
Yifu Chen
Zongsheng Wang
Bowen Wu
Mengyuan Li
Huan Zhang
Lin Ma
Feng Liu
Qihang Feng
Baoxun Wang
CVBM
25
3
0
14 Aug 2019
Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling
Yi-Ting Yeh
Tzu-Chuan Lin
Hsiao-Hua Cheng
Yuanyuan Deng
Shang-Yu Su
Yun-Nung Chen
74
16
0
14 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
95
173
0
14 Aug 2019
Previous
1
2
3
...
45
46
47
...
58
59
60
Next