Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00837
Cited By
v1
v2
v3 (latest)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"
50 / 2,037 papers shown
Title
Multimodal Attention Networks for Low-Level Vision-and-Language Navigation
Federico Landi
Lorenzo Baraldi
Marcella Cornia
M. Corsini
Rita Cucchiara
LM&Ro
96
29
0
27 Nov 2019
Learning to Learn Words from Visual Scenes
Dídac Surís
Dave Epstein
Heng Ji
Shih-Fu Chang
Carl Vondrick
VLM
CLIP
SSL
OffRL
72
4
0
25 Nov 2019
Unsupervised Keyword Extraction for Full-sentence VQA
Kohei Uehara
Tatsuya Harada
32
1
0
23 Nov 2019
Temporal Reasoning via Audio Question Answering
Haytham M. Fayek
Justin Johnson
65
54
0
21 Nov 2019
Inspect Transfer Learning Architecture with Dilated Convolution
Syeda Noor Jaha Azim
Md. Aminur Rab Ratul
31
0
0
20 Nov 2019
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Badri N. Patro
Anupriy
Vinay P. Namboodiri
AAML
FAtt
85
26
0
19 Nov 2019
Question-Conditioned Counterfactual Image Generation for VQA
Jingjing Pan
Yash Goyal
Stefan Lee
EgoV
OOD
90
19
0
14 Nov 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
100
197
0
14 Nov 2019
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Yiming Xu
Lin Chen
Zhongwei Cheng
Lixin Duan
Jiebo Luo
OOD
86
24
0
11 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
126
338
0
10 Nov 2019
SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform
Paul A. Crook
Shivani Poddar
Ankita De
Semir Shafi
David Whitney
A. Geramifard
R. Subba
48
18
0
07 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
Alex Schwing
LRM
ReLM
105
9
0
31 Oct 2019
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie
Adina Williams
Emily Dinan
Joey Tianyi Zhou
Jason Weston
Douwe Kiela
243
1,014
0
31 Oct 2019
Assisting human experts in the interpretation of their visual process: A case study on assessing copper surface adhesive potency
T. Hascoet
Xuejiao Deng
Daniela Mihai
Mari Sugiyama
Yuji Adachi
Sachiko Nakamura
Jonathon S. Hare
Tomoko Hayashi
T. Takiguchi
20
1
0
24 Oct 2019
KnowIT VQA: Answering Knowledge-Based Questions about Videos
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
152
80
0
23 Oct 2019
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
76
43
0
11 Oct 2019
Modulated Self-attention Convolutional Network for VQA
Jean-Benoit Delbrouck
Antoine Maiorca
Nathan Hubens
Stéphane Dupont
29
1
0
08 Oct 2019
Meta Module Network for Compositional Visual Reasoning
Wenhu Chen
Zhe Gan
Linjie Li
Yu Cheng
Wenjie Wang
Jingjing Liu
LRM
93
71
0
08 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
149
25
0
30 Sep 2019
On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
NAI
105
9
0
30 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
62
59
0
26 Sep 2019
Synthetic Data for Deep Learning
Sergey I. Nikolenko
161
358
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
367
948
0
24 Sep 2019
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering
Heather Riley
Mohan Sridharan
NAI
54
0
0
23 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
59
13
0
23 Sep 2019
On Controlled DeEntanglement for Natural Language Processing
Sai Krishna Rallabandi
60
0
0
22 Sep 2019
Learning Sparse Mixture of Experts for Visual Question Answering
Vardaan Pahuja
Jie Fu
C. Pal
43
3
0
19 Sep 2019
Pose-aware Multi-level Feature Network for Human Object Interaction Detection
Bo Wan
Desen Zhou
Yongfei Liu
Rongjie Li
Xuming He
76
200
0
18 Sep 2019
Grounding learning of modifier dynamics: An application to color naming
Xudong Han
P. Schulz
Trevor Cohn
13
5
0
17 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
39
1
0
17 Sep 2019
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
145
13
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
86
65
0
10 Sep 2019
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
Christopher Clark
Mark Yatskar
Luke Zettlemoyer
OOD
139
469
0
09 Sep 2019
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
85
40
0
08 Sep 2019
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering
Soravit Changpinyo
Bo Pang
Piyush Sharma
Radu Soricut
ObjD
65
20
0
04 Sep 2019
PlotQA: Reasoning over Scientific Plots
Nitesh Methani
Pritha Ganguly
Mitesh M. Khapra
Pratyush Kumar
133
235
0
03 Sep 2019
Out the Window: A Crowd-Sourced Dataset for Activity Classification in Security Video
Greg Castañón
N. Shnidman
Tim Anderson
J. Byrne
46
2
0
28 Aug 2019
Language Tasks and Language Games: On Methodology in Current Natural Language Processing Research
David Schlangen
78
19
0
28 Aug 2019
Neural Text Summarization: A Critical Evaluation
Wojciech Kry'sciñski
N. Keskar
Bryan McCann
Caiming Xiong
R. Socher
129
368
0
23 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
343
1,672
0
22 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta
Alex Schwing
Derek Hoiem
65
25
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
254
2,499
0
20 Aug 2019
Message Passing for Complex Question Answering over Knowledge Graphs
Svitlana Vakulenko
J. D. F. Garcia
A. Polleres
Maarten de Rijke
Michael Cochez
91
73
0
19 Aug 2019
Language Features Matter: Effective Language Representations for Vision-Language Tasks
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
58
27
0
17 Aug 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAtt
UQCV
124
76
0
17 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
97
173
0
14 Aug 2019
Why Does a Visual Question Have Different Answers?
Nilavra Bhattacharya
Qing Li
Danna Gurari
66
66
0
12 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
109
38
0
12 Aug 2019
Multi-modality Latent Interaction Network for Visual Question Answering
Peng Gao
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Hongsheng Li
69
82
0
10 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
278
1,975
0
09 Aug 2019
Previous
1
2
3
...
36
37
38
39
40
41
Next