Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
HL Dataset: Visually-grounded Description of Scenes, Actions and Rationales
Michele Cafagna
Kees van Deemter
Albert Gatt
3DV
72
4
0
23 Feb 2023
EVJVQA Challenge: Multilingual Visual Question Answering
Ngan Luu-Thuy Nguyen
Nghia Hieu Nguyen
Duong T.D. Vo
K. Tran
Kiet Van Nguyen
95
7
0
23 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
192
216
0
20 Feb 2023
Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Xinyue Hu
Lin Gu
Kazuma Kobayashi
Qi A. An
Qingyu Chen
Zhiyong Lu
Chang Su
Tatsuya Harada
Yingying Zhu
GNN
71
10
0
19 Feb 2023
Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering
T. Yamane
Pang-jo Chun
Jiachen Dang
Takayuki Okatani
32
0
0
18 Feb 2023
On The Coherence of Quantitative Evaluation of Visual Explanations
Benjamin Vandersmissen
José Oramas
XAI
FAtt
106
4
0
14 Feb 2023
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Zhu Wang
Sourav Medya
Sathya Ravi
VLM
100
0
0
11 Feb 2023
Context Understanding in Computer Vision: A Survey
Xuan Wang
Zhigang Zhu
105
52
0
10 Feb 2023
Learning by Asking for Embodied Visual Navigation and Task Completion
Ying Shen
Ismini Lourentzou
86
1
0
09 Feb 2023
Structured Generative Models for Scene Understanding
Christopher K. I. Williams
OCL
3DV
92
3
0
07 Feb 2023
MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Jiaying Lu
Yongchen Qian
Shifan Zhao
Yuanzhe Xi
Carl Yang
VLM
78
4
0
06 Feb 2023
Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer
Min Peng
Chongyang Wang
Yu Shi
Xiang-Dong Zhou
ViT
100
7
0
04 Feb 2023
Learning to Agree on Vision Attention for Visual Commonsense Reasoning
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Fan Liu
Liqiang Nie
Mohan S. Kankanhalli
95
10
0
04 Feb 2023
Vertical Federated Learning: Taxonomies, Threats, and Prospects
Qun Li
Chandra Thapa
Lawrence Ong
Yifeng Zheng
Hua Ma
S. Çamtepe
Anmin Fu
Yan Gao
FedML
120
11
0
03 Feb 2023
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
107
32
0
01 Feb 2023
UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Ying Jin
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
ViT
133
39
0
31 Jan 2023
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
Pengzhen Ren
Changlin Li
Hang Xu
Yi Zhu
Guangrun Wang
Jian-zhuo Liu
Xiaojun Chang
Xiaodan Liang
106
45
0
31 Jan 2023
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Chen Chen
Bowen Zhang
Liangliang Cao
Jiguang Shen
Tom Gunter
Albin Madappally Jose
Alexander Toshev
Jonathon Shlens
Ruoming Pang
Yinfei Yang
VLM
3DV
63
16
0
30 Jan 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
57
1
0
28 Jan 2023
Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering
Chenxi Whitehouse
Tillman Weyde
Pranava Madhyastha
LRM
91
3
0
25 Jan 2023
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images
Kun Li
G. Vosselman
M. Yang
85
7
0
23 Jan 2023
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Razvan-George Pasca
Alexey Gavryushin
Muhammad Hamza
Yen-Ling Kuo
Kaichun Mo
Luc Van Gool
Otmar Hilliges
Xi Wang
171
14
0
22 Jan 2023
Champion Solution for the WSDM2023 Toloka VQA Challenge
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
90
2
0
22 Jan 2023
Tensor Networks Meet Neural Networks: A Survey and Future Perspectives
Maolin Wang
Yu Pan
Zenglin Xu
Xiangli Yang
Guangxi Li
A. Cichocki
Andrzej Cichocki
210
22
0
22 Jan 2023
Improving Zero-Shot Action Recognition using Human Instruction with Text Description
Na Wu
Hiroshi Kera
K. Kawamoto
67
7
0
21 Jan 2023
Joint Representation Learning for Text and 3D Point Cloud
Rui Huang
Xuran Pan
Henry Zheng
Haojun Jiang
Zhifeng Xie
S. Song
Gao Huang
95
21
0
18 Jan 2023
Towards Models that Can See and Read
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
77
13
0
18 Jan 2023
Curriculum Script Distillation for Multilingual Visual Question Answering
Khyathi Chandu
A. Geramifard
71
0
0
17 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
107
57
0
16 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Songlin Yang
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
118
41
0
12 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner
O. Ferret
C. Guinaudeau
84
9
0
11 Jan 2023
MAQA: A Multimodal QA Benchmark for Negation
Judith Yue Li
Aren Jansen
Qingqing Huang
Joonseok Lee
Ravi Ganti
Dima Kuzmin
81
5
0
09 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
120
4
0
05 Jan 2023
Adaptively Clustering Neighbor Elements for Image-Text Generation
Zihua Wang
Xu Yang
Hanwang Zhang
Haiyang Xu
Mingshi Yan
Feisi Huang
Yu Zhang
VLM
180
0
0
05 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
79
16
0
26 Dec 2022
When are Lemons Purple? The Concept Association Bias of Vision-Language Models
Yutaro Yamada
Yingtian Tang
Yoyo Zhang
Ilker Yildirim
CoGe
64
15
0
22 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLM
MLLM
ObjD
134
259
0
21 Dec 2022
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Jiaxian Guo
Junnan Li
Dongxu Li
A. M. H. Tiong
Boyang Albert Li
Dacheng Tao
Steven C. H. Hoi
VLM
MLLM
101
118
0
21 Dec 2022
Is GPT-3 a Good Data Annotator?
Bosheng Ding
Chengwei Qin
Linlin Liu
Yew Ken Chia
Shafiq Joty
Boyang Albert Li
Lidong Bing
97
250
0
20 Dec 2022
Are Deep Neural Networks SMARTer than Second Graders?
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Kevin A. Smith
J. Tenenbaum
AAML
LRM
ReLM
116
31
0
20 Dec 2022
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
77
38
0
19 Dec 2022
Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning
Hui Li
Mingjie Sun
Jimin Xiao
Eng Gee Lim
Yao-Min Zhao
90
21
0
17 Dec 2022
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
120
5
0
16 Dec 2022
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Bernd Bohnet
Vinh Q. Tran
Pat Verga
Roee Aharoni
D. Andor
...
Michael Collins
Dipanjan Das
Donald Metzler
Slav Petrov
Kellie Webster
128
65
0
15 Dec 2022
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Chengzhi Mao
Revant Teotia
Amrutha Sundar
Sachit Menon
Junfeng Yang
Xin Eric Wang
Carl Vondrick
68
31
0
12 Dec 2022
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue
Sunjae Yoon
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
Changdong Yoo
70
20
0
12 Dec 2022
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
RALM
VLM
108
96
0
10 Dec 2022
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
Jishnu Mukhoti
Tsung-Yu Lin
Omid Poursaeed
Rui Wang
Ashish Shah
Philip Torr
Ser-Nam Lim
VLM
137
83
0
09 Dec 2022
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
85
28
0
08 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
88
16
0
08 Dec 2022
Previous
1
2
3
...
23
24
25
...
58
59
60
Next