Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,890 papers shown
Title
Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving
Enming Zhang
Peizhe Gong
Xingyuan Dai
Yisheng Lv
Qinghai Miao
MLLM
ELM
65
2
0
09 Mar 2025
Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
Md Azim Khan
A. Gangopadhyay
Jianwu Wang
Robert F. Erbacher
VLM
59
0
0
08 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Xin Jin
Hui Su
Jinlan Fu
Xiaoyu Shen
68
0
0
08 Mar 2025
Automatic Teaching Platform on Vision Language Retrieval Augmented Generation
Ruslan Gokhman
Jialu Li
Youshan Zhang
VLM
48
0
0
07 Mar 2025
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning
Zhong Ji
Weilong Cao
Yan Zhang
Yanwei Pang
Jungong Han
Xuelong Li
DiffM
VLM
52
0
0
06 Mar 2025
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Sunghyun Ahn
Youngwan Jo
Kijung Lee
Sein Kwon
Inpyo Hong
Sanghyun Park
63
0
0
06 Mar 2025
Task-Agnostic Attacks Against Vision Foundation Models
Brian Pulfer
Yury Belousov
Vitaliy Kinakh
Teddy Furon
S. Voloshynovskiy
AAML
77
0
0
05 Mar 2025
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
Jiayi Zhao
Fei Teng
Kai Luo
Guoqiang Zhao
Zehan Li
Xu Zheng
Kailun Yang
VLM
79
6
0
04 Mar 2025
Are Large Vision Language Models Good Game Players?
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
ELM
LRM
104
4
0
04 Mar 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Shiqi Chen
Tongyao Zhu
Ruochen Zhou
Jinghan Zhang
Siyang Gao
Juan Carlos Niebles
Mor Geva
Junxian He
Jiajun Wu
Manling Li
LRM
60
0
0
03 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
204
0
0
02 Mar 2025
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow
Jiaqi Bai
Hongcheng Guo
Zhongyuan Peng
Jian Yang
Zhiyu Li
Mingze Li
Zhihong Tian
VLM
62
0
0
28 Feb 2025
FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts
Ziyi Zhang
Zhen Sun
Zhe Zhang
Jihui Guo
Xinlei He
AAML
59
2
0
28 Feb 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
102
8
0
28 Feb 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Zhenyu Liu
Yunxin Li
Baotian Hu
Wenhan Luo
Yaowei Wang
Min-Ling Zhang
67
0
0
27 Feb 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
49
0
0
26 Feb 2025
Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation
Xin Liu
Zhe Zhang
Jingxin Nie
72
0
0
26 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
73
0
0
26 Feb 2025
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Nilay Yilmaz
Maitreya Patel
Yiran Luo
Tejas Gokhale
Chitta Baral
Suren Jayasuriya
Yezhou Yang
LRM
38
0
0
25 Feb 2025
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
Qiuchen Wang
Ruixue Ding
Zehui Chen
Weiqi Wu
Shihang Wang
Pengjun Xie
Feng Zhao
62
1
0
25 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
126
7
0
25 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
82
1
0
25 Feb 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
88
0
0
25 Feb 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
54
0
0
24 Feb 2025
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
Davide Testa
Giovanni Bonetta
Raffaella Bernardi
Alessandro Bondielli
Alessandro Lenci
Alessio Miaschi
Lucia Passaro
Bernardo Magnini
VGen
LRM
50
0
0
24 Feb 2025
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
Jiahao Nick Li
Zhuohao Jerry Zhang
Zhang
67
1
0
24 Feb 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Nidhal Jegham
Marwan Abdelatti
Abdeltawab Hendawi
VLM
LRM
60
1
0
23 Feb 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
Xinwei Long
Zhiyuan Ma
Ermo Hua
Kaiyan Zhang
Biqing Qi
Bowen Zhou
RALM
48
0
0
23 Feb 2025
Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
Qipan Xu
Zhilin Wang
Xiaoxiao He
Ligong Han
Ruixiang Tang
41
0
0
23 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
82
8
0
21 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
46
7
0
21 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Zehan Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Yansen Wang
51
0
0
19 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jun Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
73
1
0
18 Feb 2025
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
Weikang Qiu
Zheng Huang
Haoyu Hu
Aosong Feng
Yujun Yan
Rex Ying
47
0
0
18 Feb 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Heng Chang
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
93
1
0
17 Feb 2025
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
Shintaro Ozaki
Kazuki Hayashi
Yusuke Sakai
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
LRM
110
1
0
17 Feb 2025
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
107
3
0
17 Feb 2025
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
Hugh Mee Wong
Rick Nouwen
Albert Gatt
51
0
0
17 Feb 2025
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Xin Gu
Yaojie Shen
Chenxi Luo
Tiejian Luo
Yan Huang
Yuewei Lin
Heng Fan
L. Zhang
73
1
0
16 Feb 2025
ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis
Xueshen Li
Xinlong Hou
Ziyi Huang
Yu Gan
LM&MA
MedIm
54
0
0
15 Feb 2025
Visual Graph Question Answering with ASP and LLMs for Language Parsing
Jakob Johannes Bauer
Thomas Eiter
Nelson Higuera Ruiz
J. Oetsch
GNN
64
0
0
13 Feb 2025
Commonsense Reasoning-Aided Autonomous Vehicle Systems
Keegan Kimbrell
LRM
83
0
0
13 Feb 2025
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models
Vishal Narnaware
Ashmal Vayani
Rohit Gupta
Swetha Sirnam
Mubarak Shah
116
3
0
12 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
105
5
0
12 Feb 2025
DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities
Chashi Mahiul Islam
Samuel Jacob Chacko
Preston Horne
Xiuwen Liu
110
1
0
11 Feb 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
75
7
0
10 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
Dawei Yin
67
1
0
09 Feb 2025
MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents
Wanqi Yang
Yong Li
Meng Fang
L. Chen
64
1
0
09 Feb 2025
Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment
Minh-Quan Le
Gaurav Mittal
Tianjian Meng
A S M Iftekhar
Vishwas Suryanarayanan
Barun Patra
Dimitris Samaras
Mei Chen
DiffM
69
0
0
07 Feb 2025
Evaluating Hallucination in Large Vision-Language Models based on Context-Aware Object Similarities
Shounak Datta
Dhanasekar Sundararaman
47
1
0
28 Jan 2025
Previous
1
2
3
4
5
6
...
56
57
58
Next