Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.17331
Cited By
v1
v2 (latest)
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
17 February 2025
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering"
20 / 70 papers shown
Title
KAT: A Knowledge Augmented Transformer for Vision-and-Language
Liangke Gui
Borui Wang
Qiuyuan Huang
Alexander G. Hauptmann
Yonatan Bisk
Jianfeng Gao
68
158
0
16 Dec 2021
Learning Efficient Multi-Agent Cooperative Visual Exploration
Chao Yu
Xinyi Yang
Jiaxuan Gao
Huazhong Yang
Yu Wang
Yi Wu
56
30
0
12 Oct 2021
Coarse-to-Fine Reasoning for Visual Question Answering
Binh X. Nguyen
Tuong Khanh Long Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
A. Nguyen
NAI
113
39
0
06 Oct 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
251
421
0
10 Sep 2021
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Yifan Liu
Zhixiong Nan
N. Zheng
OOD
70
19
0
24 Jul 2021
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua Wu
Haifeng Wang
107
380
0
31 Dec 2020
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino
Xinlei Chen
Devi Parikh
Abhinav Gupta
Marcus Rohrbach
104
185
0
20 Dec 2020
Modeling Voters in Multi-Winner Approval Voting
J. Scheuerman
Jason L. Harman
Nicholas Mattei
Kristen Brent Venable
28
5
0
04 Dec 2020
Understanding Guided Image Captioning Performance across Domains
Edwin G. Ng
Bo Pang
P. Sharma
Radu Soricut
72
25
0
04 Dec 2020
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zanxia Jin
Heran Wu
Chun Yang
Fang Zhou
Jingyan Qin
Lei Xiao
Xu-Cheng Yin
69
31
0
24 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
667
41,369
0
22 Oct 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
76
498
0
11 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
838
42,332
0
28 May 2020
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
247
2,488
0
20 Aug 2019
Learning by Abstraction: The Neural State Machine
Drew A. Hudson
Christopher D. Manning
NAI
OCL
77
260
0
09 Jul 2019
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
74
173
0
10 May 2019
ASER: A Large-scale Eventuality Knowledge Graph
Hongming Zhang
Xin Liu
Haojie Pan
Yangqiu Song
C. Leung
SLR
77
163
0
01 May 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
117
325
0
20 Jan 2019
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
345
3,270
0
02 Dec 2016
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
214
5,497
0
03 May 2015
Previous
1
2