Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
118
14
0
01 May 2020
HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do
Keith Curtis
G. Awad
Shahzad Rajput
I. Soboroff
16
32
0
01 May 2020
Visuo-Linguistic Question Answering (VLQA) Challenge
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
CoGe
28
1
0
01 May 2020
Explainable Deep Learning: A Field Guide for the Uninitiated
Gabrielle Ras
Ning Xie
Marcel van Gerven
Derek Doran
AAML
XAI
120
382
0
30 Apr 2020
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
107
234
0
29 Apr 2020
Pragmatic Issue-Sensitive Image Captioning
Allen Nie
Reuben Cohn-Gordon
Christopher Potts
53
24
0
29 Apr 2020
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
84
27
0
28 Apr 2020
VD-BERT: A Unified Vision and Dialog Transformer with BERT
Yue Wang
Shafiq Joty
Michael R. Lyu
Irwin King
Caiming Xiong
Guosheng Lin
116
104
0
28 Apr 2020
MCQA: Multimodal Co-attention Based Network for Question Answering
Abhishek Kumar
Trisha Mittal
Tianyi Zhou
40
14
0
25 Apr 2020
Deep Multimodal Neural Architecture Search
Zhou Yu
Yuhao Cui
Jun-chen Yu
Meng Wang
Dacheng Tao
Qi Tian
70
100
0
25 Apr 2020
Explicit Domain Adaptation with Loosely Coupled Samples
Oliver Scheel
L. Schwarz
Nassir Navab
Federico Tombari
OOD
40
2
0
24 Apr 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
71
23
0
24 Apr 2020
Debiasing Skin Lesion Datasets and Models? Not So Fast
Alceu Bissoto
Eduardo Valle
Sandra Avila
102
55
0
23 Apr 2020
Visual Question Answering Using Semantic Information from Image Descriptions
Tasmia Tasrin
Md Sultan al Nahian
Brent Harrison
28
0
0
23 Apr 2020
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
J. S. Park
Chandra Bhagavatula
Roozbeh Mottaghi
Ali Farhadi
Yejin Choi
ReLM
LRM
75
6
0
22 Apr 2020
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
126
361
0
21 Apr 2020
A Revised Generative Evaluation of Visual Dialogue
Daniela Massiceti
Viveka Kulharia
P. Dokania
N. Siddharth
Philip Torr
40
0
0
20 Apr 2020
Variational Inference for Learning Representations of Natural Language Edits
Edison Marrese-Taylor
Machel Reid
Y. Matsuo
BDL
DRL
KELM
101
8
0
20 Apr 2020
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense
Yixin Zhu
Tao Gao
Lifeng Fan
Siyuan Huang
Mark Edmonds
...
Fangqiu Yi
Siyuan Qi
Ying Nian Wu
J. Tenenbaum
Song-Chun Zhu
112
130
0
20 Apr 2020
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
OOD
SSL
CML
93
119
0
20 Apr 2020
Are we pretraining it right? Digging deeper into visio-linguistic pretraining
Amanpreet Singh
Vedanuj Goswami
Devi Parikh
VLM
78
48
0
19 Apr 2020
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence
Huy Manh Nguyen
Tomo Miyazaki
Yoshihiro Sugaya
S. Omachi
144
1
0
16 Apr 2020
Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer
Gi-Cheon Kang
Junseok Park
Hwaran Lee
Byoung-Tak Zhang
Jin-Hwa Kim
VLM
62
10
0
14 Apr 2020
Visual Grounding Methods for VQA are Working for the Wrong Reasons!
Robik Shrestha
Kushal Kafle
Christopher Kanan
CML
66
35
0
12 Apr 2020
An Entropy Clustering Approach for Assessing Visual Question Difficulty
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
Shuníchi Satoh
OOD
AAML
60
1
0
12 Apr 2020
Rephrasing visual questions by specifying the entropy of the answer distribution
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
S. Satoh
OOD
44
2
0
10 Apr 2020
Multimodal Categorization of Crisis Events in Social Media
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
98
88
0
10 Apr 2020
SpatialSim: Recognizing Spatial Configurations of Objects with Graph Neural Networks
Laetitia Teodorescu
Katja Hofmann
Pierre-Yves Oudeyer
58
1
0
09 Apr 2020
Learning to Scale Multilingual Representations for Vision-Language Tasks
Andrea Burns
Donghyun Kim
Derry Wijaya
Kate Saenko
Bryan A. Plummer
50
35
0
09 Apr 2020
Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing
Goonmeet Bajaj
Bortik Bandyopadhyay
Daniela Schmidt
Pranav Maneriker
Christopher Myers
Srinivasan Parthasarathy
35
2
0
08 Apr 2020
Query-controllable Video Summarization
Jia-Hong Huang
Marcel Worring
47
46
0
07 Apr 2020
Iterative Context-Aware Graph Inference for Visual Dialog
Dan Guo
Haibo Wang
Hanwang Zhang
Zhengjun Zha
Meng Wang
79
49
0
05 Apr 2020
Generating Rationales in Visual Question Answering
Hammad A. Ayyubi
Md. Mehrab Tanjim
Julian McAuley
G. Cottrell
LRM
47
6
0
04 Apr 2020
Open Domain Dialogue Generation with Latent Images
Ze Yang
Wei Wu
Huang Hu
Can Xu
Wei Wang
Zhoujun Li
76
30
0
04 Apr 2020
Benchmarking Machine Reading Comprehension: A Psychological Perspective
Saku Sugawara
Pontus Stenetorp
Akiko Aizawa
54
2
0
04 Apr 2020
Evaluating Multimodal Representations on Visual Semantic Textual Similarity
Oier López de Lacalle
Ander Salaberria
Aitor Soroa Etxabe
Gorka Azkune
Eneko Agirre
41
2
0
04 Apr 2020
Learning Representations For Images With Hierarchical Labels
Ankit Dhall
SSL
50
2
0
02 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
197
440
0
02 Apr 2020
Consistent Multiple Sequence Decoding
Bicheng Xu
Leonid Sigal
57
0
0
02 Apr 2020
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Difei Gao
Ke Li
Ruiping Wang
Shiguang Shan
Xilin Chen
92
113
0
31 Mar 2020
Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters
.Ilker Kesen
Ozan Arkan Can
Erkut Erdem
Aykut Erdem
Deniz Yuret
VLM
55
1
0
28 Mar 2020
P
≈
\approx
≈
NP, at least in Visual Question Answering
Shailza Jolly
Sebastián M. Palacio
Joachim Folz
Federico Raue
Jörn Hees
Andreas Dengel
24
0
0
26 Mar 2020
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
J. Liu
Wenhu Chen
Yu Cheng
Zhe Gan
Licheng Yu
Yiming Yang
Jingjing Liu
MLLM
VGen
102
70
0
25 Mar 2020
Linguistically Driven Graph Capsule Network for Visual Question Reasoning
Qingxing Cao
Xiaodan Liang
Keze Wang
Liang Lin
GNN
47
3
0
23 Mar 2020
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
479
24
0
22 Mar 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
201
192
0
19 Mar 2020
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry
Diego Marcos
J. Murray
D. Tuia
126
223
0
16 Mar 2020
Ground Truth Evaluation of Neural Network Explanations with CLEVR-XAI
L. Arras
Ahmed Osman
Wojciech Samek
XAI
AAML
97
157
0
16 Mar 2020
Vision-Dialog Navigation by Exploring Cross-modal Memory
Yi Zhu
Fengda Zhu
Zhaohuan Zhan
Bingqian Lin
Jianbin Jiao
Xiaojun Chang
Xiaodan Liang
VLM
91
49
0
15 Mar 2020
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen
Xin Yan
Jun Xiao
Hanwang Zhang
Shiliang Pu
Yueting Zhuang
OOD
AAML
219
294
0
14 Mar 2020
Previous
1
2
3
...
42
43
44
...
58
59
60
Next