Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.11150
Cited By
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
17 November 2024
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Comprehensive Survey on Visual Question Answering Datasets and Algorithms"
50 / 69 papers shown
Title
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
57
62
0
04 Jun 2022
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
OOD
46
142
0
18 Sep 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
96
380
0
30 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
865
42,379
0
28 May 2020
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
OOD
SSL
CML
84
119
0
20 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
132
1,944
0
13 Apr 2020
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen
Xin Yan
Jun Xiao
Hanwang Zhang
Shiliang Pu
Yueting Zhuang
OOD
AAML
208
292
0
14 Mar 2020
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
60
320
0
10 Jan 2020
Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing
Vedika Agarwal
Rakshith Shetty
Mario Fritz
CML
AAML
72
159
0
16 Dec 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
355
942
0
24 Sep 2019
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
Christopher Clark
Mark Yatskar
Luke Zettlemoyer
OOD
88
466
0
09 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
169
1,666
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
250
2,488
0
20 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
153
1,963
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
240
3,695
0
06 Aug 2019
Bilinear Graph Networks for Visual Question Answering
Dalu Guo
Chang Xu
Dacheng Tao
GNN
64
52
0
23 Jul 2019
Learning by Abstraction: The Neural State Machine
Drew A. Hudson
Christopher D. Manning
NAI
OCL
80
260
0
09 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
87
808
0
25 Jun 2019
RUBi: Reducing Unimodal Biases in Visual Question Answering
Rémi Cadène
Corentin Dancette
H. Ben-younes
Matthieu Cord
Devi Parikh
CML
99
373
0
24 Jun 2019
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
108
360
0
31 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OOD
NAI
73
161
0
24 May 2019
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
Jiayuan Mao
Chuang Gan
Pushmeet Kohli
J. Tenenbaum
Jiajun Wu
NAI
138
702
0
26 Apr 2019
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
111
1,253
0
18 Apr 2019
Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention
Shalini Ghosh
Giedrius Burachas
Arijit Ray
Avi Ziskind
53
65
0
15 Feb 2019
Cycle-Consistency for Robust Visual Question Answering
Meet Shah
Xinlei Chen
Marcus Rohrbach
Devi Parikh
OOD
65
190
0
15 Feb 2019
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
Ramprasaath R. Selvaraju
Stefan Lee
Yilin Shen
Hongxia Jin
Shalini Ghosh
Larry Heck
Dhruv Batra
Devi Parikh
FAtt
VLM
64
254
0
11 Feb 2019
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
H. Ben-younes
Rémi Cadène
Nicolas Thome
Matthieu Cord
57
218
0
31 Jan 2019
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
198
234
0
05 Dec 2018
TallyQA: Answering Complex Counting Questions
Manoj Acharya
Kushal Kafle
Christopher Kanan
57
125
0
29 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,175
0
11 Oct 2018
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Kexin Yi
Jiajun Wu
Chuang Gan
Antonio Torralba
Pushmeet Kohli
J. Tenenbaum
NAI
84
611
0
04 Oct 2018
Cascaded Mutual Modulation for Visual Reasoning
Yiqun Yao
Jiaming Xu
Feng Wang
Bo Xu
LRM
55
14
0
06 Sep 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
63
56
0
06 Sep 2018
Explainable Neural Computation via Stack Neural Module Networks
Ronghang Hu
Jacob Andreas
Trevor Darrell
Kate Saenko
LRM
OCL
78
199
0
23 Jul 2018
Attention on Attention: Architectures for Visual Question Answering (VQA)
Jasdeep Singh
Vincent Ying
Alex Nutkiewicz
57
26
0
21 Mar 2018
Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
David Mascharka
Philip Tran
Ryan Soklaski
Arjun Majumdar
116
207
0
14 Mar 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Anna Rohrbach
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
83
423
0
15 Feb 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
230
11,566
0
15 Feb 2018
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
Qing Li
Jianlong Fu
D. Yu
Tao Mei
Jiebo Luo
FAtt
XAI
CoGe
93
60
0
27 Jan 2018
Interpretable Counting for Visual Question Answering
Alexander R. Trott
Caiming Xiong
R. Socher
74
71
0
23 Dec 2017
Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks
Guohao Li
Hang Su
Wenwu Zhu
82
46
0
03 Dec 2017
Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering
Pan Lu
Hongsheng Li
Wei Zhang
Jianyong Wang
Xiaogang Wang
67
80
0
18 Nov 2017
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAtt
AIMat
OffRL
AI4CE
361
2,233
0
22 Sep 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,221
0
25 Jul 2017
Inferring and Executing Programs for Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
NAI
89
545
0
10 May 2017
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
KELM
GNN
ReLM
LRM
129
579
0
18 Apr 2017
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
360
27,244
0
20 Mar 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
313
2,387
0
20 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
347
3,270
0
02 Dec 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
324
20,086
0
07 Oct 2016
1
2
Next