Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Learning Multimodal Affinities for Textual Editing in Images
Or Perel
Oron Anschel
Omri Ben-Eliezer
Shai Mazor
Hadar Averbuch-Elor
72
1
0
18 Mar 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
89
84
0
16 Mar 2021
SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Chenliang Li
Ming Yan
Haiyang Xu
Fuli Luo
Wei Wang
Bin Bi
Songfang Huang
VLM
74
36
0
14 Mar 2021
What is Multimodality?
Letitia Parcalabescu
Nils Trost
Anette Frank
56
0
0
10 Mar 2021
Cross-modal Image Retrieval with Deep Mutual Information Maximization
Chunbin Gu
Jiajun Bu
Xixi Zhou
Chengwei Yao
Dongfang Ma
Zhi Yu
Xifeng Yan
57
16
0
10 Mar 2021
RL-CSDia: Representation Learning of Computer Science Diagrams
Shaowei Wang
LingLing Zhang
Xuan Luo
Yi Yang
Xin Hu
Jun Liu
3DV
35
2
0
10 Mar 2021
A Discriminative Vectorial Framework for Multi-modal Feature Representation
Lei Gao
L. Guan
29
11
0
09 Mar 2021
Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Aman Jain
Mayank Kothyari
Vishwajeet Kumar
Preethi Jyothi
Ganesh Ramakrishnan
Soumen Chakrabarti
68
36
0
09 Mar 2021
Data augmentation by morphological mixup for solving Raven's Progressive Matrices
Wentao He
Jianfeng Ren
Ruibin Bai
61
2
0
09 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
140
39
0
06 Mar 2021
Selective Replay Enhances Learning in Online Continual Analogical Reasoning
Tyler L. Hayes
Christopher Kanan
CLL
85
20
0
06 Mar 2021
Rissanen Data Analysis: Examining Dataset Characteristics via Description Length
Ethan Perez
Douwe Kiela
Kyunghyun Cho
82
24
0
05 Mar 2021
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
101
158
0
05 Mar 2021
Visual Question Answering: which investigated applications?
Silvio Barra
Carmen Bisogni
M. De Marsico
S. Ricciardi
80
38
0
04 Mar 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
286
322
0
02 Mar 2021
MultiSubs: A Large-scale Multimodal and Multilingual Dataset
Josiah Wang
Pranava Madhyastha
J. Figueiredo
Chiraag Lala
Lucia Specia
VGen
68
11
0
02 Mar 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
61
14
0
01 Mar 2021
KANDINSKYPatterns -- An experimental exploration environment for Pattern Analysis and Machine Intelligence
Andreas Holzinger
Anna Saranti
Heimo Mueller
114
10
0
28 Feb 2021
Learning Compositional Representation for Few-shot Visual Question Answering
Dalu Guo
Dacheng Tao
OOD
CoGe
64
4
0
21 Feb 2021
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
Bo Liu
Li-Ming Zhan
Li Xu
Lin Ma
Y. Yang
Xiao-Ming Wu
106
274
0
18 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
568
1,143
0
17 Feb 2021
I Want This Product but Different : Multimodal Retrieval with Synthetic Query Expansion
Ivona Tautkute
Tomasz Trzciñski
77
4
0
17 Feb 2021
Dataset Condensation with Differentiable Siamese Augmentation
Bo Zhao
Hakan Bilen
DD
273
305
0
16 Feb 2021
Composing Pick-and-Place Tasks By Grounding Language
Oier Mees
Wolfram Burgard
LM&Ro
80
37
0
16 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
191
666
0
11 Feb 2021
A Metamodel and Framework for Artificial General Intelligence From Theory to Practice
Hugo Latapie
Özkan Kiliç
Gaowen Liu
Yan Yan
Ramana Rao Kompella
Pei Wang
K. Thórisson
Adam Lawrence
Yuhong Sun
Jayanth Srinivasa
AI4CE
61
9
0
11 Feb 2021
Towards Better Explanations of Class Activation Mapping
Hyungsik Jung
Youngrock Oh
FAtt
82
79
0
10 Feb 2021
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network
Linwei Ye
Mrigank Rochan
Zhi Liu
Xiaoqin Zhang
Yang Wang
VOS
EgoV
66
57
0
09 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Yebin Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
70
30
0
03 Feb 2021
Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard H. Hovy
Hinrich Schütze
Yoav Goldberg
HILM
342
371
0
01 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
150
117
0
31 Jan 2021
M2FN: Multi-step Modality Fusion for Advertisement Image Assessment
Kyung-Wha Park
Jung-Woo Ha
Junghoon Lee
Sunyoung Kwon
Kyung-Min Kim
Byoung-Tak Zhang
21
2
0
31 Jan 2021
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games
Alessandro Suglia
Yonatan Bisk
Ioannis Konstas
Antonio Vergari
E. Bastianelli
Andrea Vanzo
Oliver Lemon
40
8
0
31 Jan 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
102
67
0
28 Jan 2021
DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents
Tsu-Jui Fu
Wenjie Wang
Daniel J. McDuff
Yale Song
90
53
0
28 Jan 2021
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
Yehao Li
Yingwei Pan
Ting Yao
Jingwen Chen
Tao Mei
VLM
95
53
0
27 Jan 2021
VisualMRC: Machine Reading Comprehension on Document Images
Ryota Tanaka
Kyosuke Nishida
Sen Yoshida
101
146
0
27 Jan 2021
Unanswerable Questions about Images and Texts
E. Davis
79
12
0
25 Jan 2021
Weakly Supervised Thoracic Disease Localization via Disease Masks
Hyun-woo Kim
Hong G Jung
Seong-Whan Lee
49
9
0
25 Jan 2021
WebSRC: A Dataset for Web-Based Structural Reading Comprehension
Xingyu Chen
Zihan Zhao
Lu Chen
Danyang Zhang
Jiabao Ji
Ao Luo
Yuxuan Xiong
Kai Yu
RALM
82
98
0
23 Jan 2021
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation
Jungjun Kim
Dong-Gyu Lee
Jialin Wu
Hong G Jung
Seong-Whan Lee
ObjD
91
22
0
22 Jan 2021
Understanding in Artificial Intelligence
S. Maetschke
D. M. Iraola
Pieter Barnard
Elaheh Shafieibavani
Peter Zhong
Ying Xu
Antonio Jimeno Yepes
ELM
VLM
49
0
0
17 Jan 2021
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
139
5
0
16 Jan 2021
Understanding the Role of Scene Graphs in Visual Question Answering
Vinay Damodaran
Sharanya Chakravarthy
Akshay Kumar
Anjana Umapathy
Teruko Mitamura
Yuta Nakashima
Noa Garcia
Chenhui Chu
GNN
169
33
0
14 Jan 2021
Explainability of deep vision-based autonomous driving systems: Review and challenges
Éloi Zablocki
H. Ben-younes
P. Pérez
Matthieu Cord
XAI
186
178
0
13 Jan 2021
MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding
Woojeong Jin
Maziar Sanjabi
Shaoliang Nie
L Tan
Xiang Ren
Hamed Firooz
30
6
0
06 Jan 2021
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
87
32
0
05 Jan 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
406
2,570
0
04 Jan 2021
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
Yiran Xing
Z. Shi
Zhao Meng
Gerhard Lakemeyer
Yunpu Ma
Roger Wattenhofer
VLM
128
40
0
02 Jan 2021
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue
Hung Le
Chinnadhurai Sankar
Seungwhan Moon
Ahmad Beirami
A. Geramifard
Satwik Kottur
VGen
93
19
0
01 Jan 2021
Previous
1
2
3
...
37
38
39
...
58
59
60
Next