Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
Ngoc Son Nguyen
Van Nguyen
Tung Le
ViT
86
1
0
30 Jul 2024
Assessing Graphical Perception of Image Embedding Models using Channel Effectiveness
S. Lee
Minsuk Chang
Seokhyeon Park
Jinwook Seo
101
2
0
30 Jul 2024
Autonomous Improvement of Instruction Following Skills via Foundation Models
Zhiyuan Zhou
P. Atreya
Abraham Lee
Homer Walke
Oier Mees
Sergey Levine
95
14
0
30 Jul 2024
FlexAttention for Efficient High-Resolution Vision-Language Models
Junyan Li
Delin Chen
Tianle Cai
Peihao Chen
Yining Hong
Zhenfang Chen
Yikang Shen
Chuang Gan
VLM
125
5
0
29 Jul 2024
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning
Xingchen Zeng
Haichuan Lin
Yilin Ye
Wei Zeng
98
17
0
29 Jul 2024
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang
Jiting Cai
Mingyu Liu
Yue Xu
Cewu Lu
Yong-Lu Li
LRM
72
6
0
29 Jul 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
90
2
0
28 Jul 2024
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering
Mahiro Ukai
Shuhei Kurita
Atsushi Hashimoto
Yoshitaka Ushiku
Nakamasa Inoue
50
0
0
28 Jul 2024
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
Xinyu Pi
Mingyuan Wu
Jize Jiang
Haozhen Zheng
Beitong Tian
Chengxiang Zhai
Klara Nahrstedt
Zhiting Hu
VLM
108
1
0
25 Jul 2024
3D Question Answering for City Scene Understanding
Penglei Sun
Yaoxian Song
Xiang Liu
Xiaofei Yang
Qiang-qiang Wang
Tiefeng Li
Yang Yang
Xiaowen Chu
58
1
0
24 Jul 2024
Unveiling and Mitigating Bias in Audio Visual Segmentation
Peiwen Sun
Honggang Zhang
Di Hu
91
3
0
23 Jul 2024
Datasets of Visualization for Machine Learning
Can Liu
Ruike Jiang
Shaocong Tan
Jiacheng Yu
Chaofan Yang
Hanning Shao
Xiaoru Yuan
XAI
121
0
0
23 Jul 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Yiwei Ma
Zhibin Wang
Xiaoshuai Sun
Weihuang Lin
Qiang-feng Zhou
Jiayi Ji
Rongrong Ji
MLLM
VLM
105
2
0
23 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
137
13
0
22 Jul 2024
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
Jaehyeong Jeon
Kibum Kim
Kanghoon Yoon
Chanyoung Park
88
2
0
22 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
174
9
0
22 Jul 2024
Advancing Chart Question Answering with Robust Chart Component Recognition
Hanwen Zheng
Sijia Wang
Chris Thomas
Lifu Huang
89
1
0
19 Jul 2024
Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols
Gertjan J. Burghouts
Fieke Hillerstrom
Erwin Walraven
M. V. Bekkum
Frank Ruis
J. Sijs
Jelle van Mil
Judith Dijk
NAI
74
1
0
18 Jul 2024
EchoSight: Advancing Visual-Language Models with Wiki Knowledge
Yibin Yan
Weidi Xie
RALM
141
14
0
17 Jul 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
81
12
0
17 Jul 2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan
Chaofeng Chen
Yiping Ke
Xinjiang Wang
Xue Jiang
Wayne Zhang
VLM
117
29
0
17 Jul 2024
Multimodal Reranking for Knowledge-Intensive Visual Question Answering
Haoyang Wen
Honglei Zhuang
Hamed Zamani
Alexander Hauptmann
Michael Bendersky
53
1
0
17 Jul 2024
TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering
Tonmoy Rajkhowa
Amartya Roy Chowdhury
Sankalp Nagaonkar
A. Tripathi
40
2
0
16 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
91
11
0
16 Jul 2024
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Bocheng Zou
Mu Cai
Jianrui Zhang
Yong Jae Lee
74
0
1
15 Jul 2024
Pathformer3D: A 3D Scanpath Transformer for 360° Images
Rong Quan
Yantao Lai
Mengyu Qiu
Dong Liang
ViT
66
0
0
15 Jul 2024
Refusing Safe Prompts for Multi-modal Large Language Models
Zedian Shao
Hongbin Liu
Yuepeng Hu
Neil Zhenqiang Gong
MLLM
LRM
82
1
0
12 Jul 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Derek F. Wong
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELM
LRM
110
31
0
11 Jul 2024
Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images
Lucrezia Tosato
Hichem Boussaid
F. Weissgerber
Camille Kurtz
Laurent Wendling
Sylvain Lobry
71
3
0
11 Jul 2024
15M Multimodal Facial Image-Text Dataset
Dawei Dai
Yutang Li
Yingge Liu
Mingming Jia
Zhang YuanHui
Guoyin Wang
VLM
103
7
0
11 Jul 2024
Position: Measure Dataset Diversity, Don't Just Claim It
Dora Zhao
Jerone T. A. Andrews
Orestis Papakyriakopoulos
Alice Xiang
108
20
0
11 Jul 2024
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Wenqi Zhang
Zhenglin Cheng
Yuanyu He
Mengna Wang
Yongliang Shen
...
Guiyang Hou
Mingqian He
Yanna Ma
Weiming Lu
Yueting Zhuang
SyDa
178
13
0
09 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
107
17
0
08 Jul 2024
MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition
Hozaifa Kassab
Ahmed Mahmoud
Mohamed Bahaa
Ammar Mohamed
Ali Hamdi
VLM
109
0
0
08 Jul 2024
OneDiff: A Generalist Model for Image Difference Captioning
Erdong Hu
Longteng Guo
Tongtian Yue
Zijia Zhao
Shuning Xue
Jing Liu
VLM
121
2
0
08 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
70
0
0
06 Jul 2024
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
Yijia Xiao
Edward Sun
Tianyu Liu
Wei Wang
LRM
84
42
0
06 Jul 2024
Granular Privacy Control for Geolocation with Vision Language Models
Ethan Mendes
Yang Chen
James Hays
Sauvik Das
Wei Xu
Alan Ritter
92
4
0
06 Jul 2024
Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge
Young-Jun Lee
Dokyong Lee
Junyoung Youn
Kyeongjin Oh
ByungSoo Ko
Jonghwan Hyeon
Ho-Jin Choi
93
4
0
04 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
96
15
0
03 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
138
117
0
03 Jul 2024
Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective
Zhaotian Weng
Zijun Gao
Jerone Andrews
Jieyu Zhao
80
1
0
03 Jul 2024
Funny-Valen-Tine: Planning Solution Distribution Enhances Machine Abstract Reasoning Ability
Ruizhuo Song
Beiming Yuan
OOD
62
0
0
02 Jul 2024
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification
Pritish Sahu
Karan Sikka
Ajay Divakaran
MLLM
LRM
109
6
0
02 Jul 2024
Multi-Modal Video Dialog State Tracking in the Wild
Adnen Abdessaied
Lei Shi
Andreas Bulling
59
2
0
02 Jul 2024
Image-GS: Content-Adaptive Image Representation via 2D Gaussians
Yunxiang Zhang
Bingxuan Li
Alexandr Kuznetsov
Akshay Jindal
Stavros Diolatzis
Kenneth Chen
Anton Sochenov
Anton Kaplanyan
Qi Sun
3DGS
149
4
0
02 Jul 2024
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao
Qiuna Tan
Guanting Dong
Minhui Wu
Chong Sun
...
Yida Xu
Muxi Diao
Zhimin Bao
Chen Li
Honggang Zhang
VLM
LRM
111
56
0
01 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLM
CoGe
77
5
0
01 Jul 2024
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
Nan Xu
Fei Wang
Sheng Zhang
Hoifung Poon
Muhao Chen
139
7
0
01 Jul 2024
Into the Unknown: Generating Geospatial Descriptions for New Environments
Tzuf Paz-Argaman
John Palowitch
Sayali Kulkarni
Reut Tsarfaty
Jason Baldridge
113
1
0
28 Jun 2024
Previous
1
2
3
...
8
9
10
...
58
59
60
Next