Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00837
Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"
50 / 1,968 papers shown
Title
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton Van Den Hengel
Qi Wu
23
59
0
01 Jun 2020
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney
Kushal Kafle
Robik Shrestha
Ehsan Abbasnejad
Christopher Kanan
Anton Van Den Hengel
OODD
OOD
33
145
0
19 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
22
127
0
15 May 2020
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Hyounghun Kim
Zineng Tang
Joey Tianyi Zhou
33
31
0
13 May 2020
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
ZhuoSheng Zhang
Hai Zhao
Rui Wang
18
62
0
13 May 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
46
36
0
12 May 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
45
582
0
10 May 2020
What is Learned in Visually Grounded Neural Syntax Acquisition
Noriyuki Kojima
Hadar Averbuch-Elor
Alexander M. Rush
Yoav Artzi
16
20
0
04 May 2020
Visual Question Answering with Prior Class Semantics
Violetta Shevchenko
Damien Teney
A. Dick
Anton Van Den Hengel
BDL
20
7
0
04 May 2020
Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Zarana Parekh
Jason Baldridge
Daniel Cer
Austin Waters
Yinfei Yang
19
61
0
30 Apr 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
52
230
0
30 Apr 2020
STARC: Structured Annotations for Reading Comprehension
Yevgeni Berzak
J. Malmaud
R. Levy
11
22
0
30 Apr 2020
Dynamic Language Binding in Relational Visual Reasoning
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
NAI
26
19
0
30 Apr 2020
Look at the First Sentence: Position Bias in Question Answering
Miyoung Ko
Jinhyuk Lee
Hyunjae Kim
Gangwoo Kim
Jaewoo Kang
FaML
OOD
27
100
0
30 Apr 2020
Explainable Deep Learning: A Field Guide for the Uninitiated
Gabrielle Ras
Ning Xie
Marcel van Gerven
Derek Doran
AAML
XAI
46
371
0
30 Apr 2020
Pragmatic Issue-Sensitive Image Captioning
Allen Nie
Reuben Cohn-Gordon
Christopher Potts
20
24
0
29 Apr 2020
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions
Xiang Zhou
Yixin Nie
Hao Tan
Joey Tianyi Zhou
28
40
0
28 Apr 2020
A Novel Attention-based Aggregation Function to Combine Vision and Language
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
24
9
0
27 Apr 2020
Deep Multimodal Neural Architecture Search
Zhou Yu
Yuhao Cui
Jun-chen Yu
Meng Wang
Dacheng Tao
Qi Tian
19
98
0
25 Apr 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
39
23
0
24 Apr 2020
Debiasing Skin Lesion Datasets and Models? Not So Fast
Alceu Bissoto
Eduardo Valle
Sandra Avila
21
54
0
23 Apr 2020
Visual Question Answering Using Semantic Information from Image Descriptions
Tasmia Tasrin
Md Sultan al Nahian
Brent Harrison
26
0
0
23 Apr 2020
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
OOD
SSL
CML
34
118
0
20 Apr 2020
Are we pretraining it right? Digging deeper into visio-linguistic pretraining
Amanpreet Singh
Vedanuj Goswami
Devi Parikh
VLM
46
48
0
19 Apr 2020
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence
Huy Manh Nguyen
Tomo Miyazaki
Yoshihiro Sugaya
S. Omachi
45
1
0
16 Apr 2020
Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training
Joe Stacey
Pasquale Minervini
Haim Dubossarsky
Sebastian Riedel
Tim Rocktaschel
AI4CE
28
8
0
16 Apr 2020
Shortcut Learning in Deep Neural Networks
Robert Geirhos
J. Jacobsen
Claudio Michaelis
R. Zemel
Wieland Brendel
Matthias Bethge
Felix Wichmann
55
1,996
0
16 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
47
1,917
0
13 Apr 2020
Visual Grounding Methods for VQA are Working for the Wrong Reasons!
Robik Shrestha
Kushal Kafle
Christopher Kanan
CML
14
35
0
12 Apr 2020
An Entropy Clustering Approach for Assessing Visual Question Difficulty
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
Shuníchi Satoh
OOD
AAML
31
1
0
12 Apr 2020
Rephrasing visual questions by specifying the entropy of the answer distribution
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
S. Satoh
OOD
27
2
0
10 Apr 2020
Learning to Scale Multilingual Representations for Vision-Language Tasks
Andrea Burns
Donghyun Kim
Derry Wijaya
Kate Saenko
Bryan A. Plummer
21
35
0
09 Apr 2020
Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing
Goonmeet Bajaj
Bortik Bandyopadhyay
Daniela Schmidt
Pranav Maneriker
Christopher Myers
Srinivasan Parthasarathy
28
1
0
08 Apr 2020
SHOP-VRB: A Visual Reasoning Benchmark for Object Perception
Michal Nazarczuk
K. Mikolajczyk
28
21
0
06 Apr 2020
Generating Rationales in Visual Question Answering
Hammad A. Ayyubi
Md. Mehrab Tanjim
Julian McAuley
G. Cottrell
LRM
22
5
0
04 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
50
436
0
02 Apr 2020
DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style Word Generator
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Doo Soon Kim
Trung Bui
Kyomin Jung
33
15
0
01 Apr 2020
Ontology-based Interpretable Machine Learning for Textual Data
Phung Lai
Nhathai Phan
Han Hu
Anuja Badeti
David Newman
Dejing Dou
14
8
0
01 Apr 2020
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Difei Gao
Ke Li
Ruiping Wang
Shiguang Shan
Xilin Chen
16
111
0
31 Mar 2020
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
J. Liu
Wenhu Chen
Yu Cheng
Zhe Gan
Licheng Yu
Yiming Yang
Jingjing Liu
MLLM
VGen
43
68
0
25 Mar 2020
Linguistically Driven Graph Capsule Network for Visual Question Reasoning
Qingxing Cao
Xiaodan Liang
Keze Wang
Liang Lin
GNN
18
3
0
23 Mar 2020
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
268
22
0
22 Mar 2020
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry
Diego Marcos
J. Murray
D. Tuia
72
208
0
16 Mar 2020
Ground Truth Evaluation of Neural Network Explanations with CLEVR-XAI
L. Arras
Ahmed Osman
Wojciech Samek
XAI
AAML
21
150
0
16 Mar 2020
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen
Xin Yan
Jun Xiao
Hanwang Zhang
Shiliang Pu
Yueting Zhuang
OOD
AAML
154
290
0
14 Mar 2020
Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog
Shen Gao
Preslav Nakov
Chang Liu
Li Liu
Dongyan Zhao
Rui Yan
21
32
0
10 Mar 2020
Deconfounded Image Captioning: A Causal Retrospect
Xu Yang
Hanwang Zhang
Jianfei Cai
CML
18
119
0
09 Mar 2020
PathVQA: 30000+ Questions for Medical Visual Question Answering
Xuehai He
Yichen Zhang
Luntian Mou
Eric Xing
P. Xie
LM&MA
17
215
0
07 Mar 2020
HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference
Tianyu Liu
Xin Zheng
Baobao Chang
Zhifang Sui
53
23
0
05 Mar 2020
Predicting A Creator's Preferences In, and From, Interactive Generative Art
Devi Parikh
AI4CE
20
2
0
03 Mar 2020
Previous
1
2
3
...
33
34
35
...
38
39
40
Next