Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings
Kiran Ramnath
M. Hasegawa-Johnson
66
9
0
31 Dec 2020
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Leilei Gan
Rui Yan
Jiwei Li
93
30
0
30 Dec 2020
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge
Riza Velioglu
J. Rose
VLM
50
87
0
23 Dec 2020
A Multimodal Framework for the Detection of Hateful Memes
Phillip Lippe
Nithin Holla
Shantanu Chandra
S. Rajamanickam
Georgios Antoniou
Ekaterina Shutova
H. Yannakoudakis
60
74
0
23 Dec 2020
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
Letitia Parcalabescu
Albert Gatt
Anette Frank
Iacer Calixto
LRM
101
49
0
22 Dec 2020
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang
Jiayuan Mao
Jiajun Wu
Devi Parikh
David D. Cox
J. Tenenbaum
Chuang Gan
OCL
82
16
0
21 Dec 2020
Learning content and context with language bias for Visual Question Answering
Chao Yang
Su Feng
Dongsheng Li
Huawei Shen
Guoqing Wang
Bin Jiang
68
21
0
21 Dec 2020
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino
Xinlei Chen
Devi Parikh
Abhinav Gupta
Marcus Rohrbach
128
188
0
20 Dec 2020
Trying Bilinear Pooling in Video-QA
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
71
3
0
18 Dec 2020
On Modality Bias in the TVQA Dataset
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
87
35
0
18 Dec 2020
Can Transformers Reason About Effects of Actions?
Pratyay Banerjee
Chitta Baral
Man Luo
Arindam Mitra
Kuntal Kumar Pal
Tran Cao Son
Neeraj Varshney
LRM
AI4CE
79
10
0
17 Dec 2020
Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
Xi Zhu
Zhendong Mao
Chunxiao Liu
Peng Zhang
Bin Wang
Yongdong Zhang
SSL
58
117
0
17 Dec 2020
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification
Te-Lin Wu
Shikhar Singh
S. Paul
Gully A. Burns
Nanyun Peng
41
18
0
16 Dec 2020
Towards Recognizing New Semantic Concepts in New Visual Domains
Massimiliano Mancini
OOD
141
0
0
16 Dec 2020
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Linjie Li
Zhe Gan
Jingjing Liu
VLM
98
44
0
15 Dec 2020
Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution
Ron Zhu
73
81
0
15 Dec 2020
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
94
37
0
14 Dec 2020
Learning Contextual Causality from Time-consecutive Images
Hongming Zhang
Yintong Huo
Xinran Zhao
Yangqiu Song
Dan Roth
CML
59
6
0
13 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
99
67
0
10 Dec 2020
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Qi Zhu
Chenyu Gao
Peng Wang
Qi Wu
92
54
0
09 Dec 2020
Intrinsically Motivated Compositional Language Emergence
Rishi Hazra
Sonu Dixit
Sayambhu Sen
65
1
0
09 Dec 2020
Emotional Conversation Generation with Heterogeneous Graph Neural Network
Yunlong Liang
Fandong Meng
Yingxue Zhang
Jinan Xu
Jinan Xu
Jie Zhou
68
25
0
09 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
107
144
0
08 Dec 2020
CASTing Your Model: Learning to Localize Improves Self-Supervised Representations
Ramprasaath R. Selvaraju
Karan Desai
Justin Johnson
Nikhil Naik
SSL
101
80
0
08 Dec 2020
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
Tayfun Ates
Muhammed Samil Atesoglu
Cagatay Yigit
.Ilker Kesen
Mert Kobaş
Erkut Erdem
Aykut Erdem
T. Goksun
Deniz Yuret
61
31
0
08 Dec 2020
Edited Media Understanding Frames: Reasoning About the Intent and Implications of Visual Misinformation
Jeff Da
Maxwell Forbes
Rowan Zellers
Anthony Zheng
Jena D. Hwang
Antoine Bosselut
Yejin Choi
DiffM
87
13
0
08 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
110
36
0
04 Dec 2020
Classification of Multimodal Hate Speech -- The Winning Solution of Hateful Memes Challenge
Xiayu Zhong
59
15
0
02 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
102
120
0
30 Nov 2020
Point and Ask: Incorporating Pointing into Visual Question Answering
Arjun Mani
Nobline Yoo
William Fu-Hinthorn
Olga Russakovsky
3DPC
82
38
0
27 Nov 2020
Transformation Driven Visual Reasoning
Xin Hong
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
85
23
0
26 Nov 2020
Adversarial Evaluation of Multimodal Models under Realistic Gray Box Assumption
Ivan Evtimov
Russ Howes
Brian Dolhansky
Hamed Firooz
Cristian Canton Ferrer
AAML
49
10
0
25 Nov 2020
Multimodal Learning for Hateful Memes Detection
Yi Zhou
Zhenhao Chen
87
61
0
25 Nov 2020
XTQA: Span-Level Explanations of the Textbook Question Answering
Jie Ma
Q. Zheng
Jun Liu
Qingyu Yin
Jianlong Zhou
Y. Huang
34
13
0
25 Nov 2020
Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Varnith Chordia
B. Vijaykumar
57
7
0
23 Nov 2020
Interpretable Visual Reasoning via Induced Symbolic Space
Zhonghao Wang
Kai Wang
Mo Yu
Jinjun Xiong
Wen-mei W. Hwu
M. Hasegawa-Johnson
Humphrey Shi
LRM
OCL
63
20
0
23 Nov 2020
Video SemNet: Memory-Augmented Video Semantic Network
Prashanth Vijayaraghavan
D. Roy
40
0
0
22 Nov 2020
Using Text to Teach Image Retrieval
Haoyu Dong
Ze Wang
Qiang Qiu
Guillermo Sapiro
3DV
75
4
0
19 Nov 2020
Improving Calibration in Deep Metric Learning With Cross-Example Softmax
Andreas Veit
Kimberly Wilber
24
2
0
17 Nov 2020
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
Aman Chadha
Gurneet Arora
Navpreet Kaloty
66
37
0
16 Nov 2020
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
Moloud Abdar
Farhad Pourpanah
Sadiq Hussain
Dana Rezazadegan
Li Liu
...
Xiaochun Cao
Abbas Khosravi
U. Acharya
V. Makarenkov
S. Nahavandi
BDL
UQCV
380
1,952
0
12 Nov 2020
Deep Multimodal Fusion by Channel Exchanging
Yikai Wang
Wenbing Huang
Gang Hua
Tingyang Xu
Yu Rong
Junzhou Huang
76
246
0
10 Nov 2020
Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning
Rachel Gardner
M. Varma
Clare Zhu
Ranjay Krishna
43
6
0
10 Nov 2020
Multi-document Summarization via Deep Learning Techniques: A Survey
Congbo Ma
W. Zhang
Mingyu Guo
Hu Wang
Quan Z. Sheng
125
129
0
10 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Haidong Zhu
Arka Sadhu
Zhao-Heng Zheng
Ram Nevatia
ObjD
66
7
0
05 Nov 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
55
45
0
04 Nov 2020
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang
Jing Li
Michael R. Lyu
Irwin King
75
16
0
03 Nov 2020
Pairwise Relations Discriminator for Unsupervised Raven's Progressive Matrices
Nicholas Quek Wei Kiat
Duo Wang
M. Jamnik
81
5
0
02 Nov 2020
Reasoning Over History: Context Aware Visual Dialog
Muhammad A. Shah
Shikib Mehri
Tejas Srinivasan
33
4
0
02 Nov 2020
DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation
Jia-Hong Huang
Chao-Han Huck Yang
Fangyu Liu
Meng Tian
Yi-Chieh Liu
...
Kang Wang
Hiromasa Morikawa
Hernghua Chang
Jesper N. Tegnér
M. Worring
MedIm
66
48
0
01 Nov 2020
Previous
1
2
3
...
38
39
40
...
58
59
60
Next