Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models
Xinmiao Hu
C. Wang
Ruihe An
ChenYu Shao
Xiaojun Ye
Sheng Zhou
Liangcheng Li
MLLM
LRM
55
0
0
26 May 2025
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
Hyunsik Chae
Seungwoo Yoon
J. Park
Chloe Yewon Chun
Yongin Cho
Mu Cai
Yong Jae Lee
Ernest K. Ryu
CoGe
VLM
52
3
0
26 May 2025
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance
Mohammad Mahdi Moradi
Sudhir Mudur
100
0
0
25 May 2025
Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation
Zhihua Liu
Amrutha Saseendran
Lei Tong
Xilin He
Fariba Yousefi
...
Dino Oglic
Tom Diethe
Philip Teare
Huiyu Zhou
Chen Jin
VLM
358
0
0
23 May 2025
SynRES: Towards Referring Expression Segmentation in the Wild via Synthetic Data
Dong-Hee Kim
Hyunjee Song
Donghyun Kim
290
0
0
23 May 2025
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Jiachen Jiang
Jinxin Zhou
Bo Peng
Xia Ning
Zhihui Zhu
102
0
0
22 May 2025
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu
Sophia Ananiadou
MoMe
KELM
CLL
105
0
0
22 May 2025
Visual Question Answering on Multiple Remote Sensing Image Modalities
Hichem Boussaid
Lucrezia Tosato
F. Weissgerber
Camille Kurtz
Laurent Wendling
Sylvain Lobry
64
0
0
21 May 2025
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
Zeqing Wang
Shiyuan Zhang
Chengpei Tang
Keze Wang
LRM
79
0
0
21 May 2025
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
Ingeol Baek
Hwan Chang
Sunghyun Ryu
Hwanhee Lee
40
0
0
21 May 2025
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Hao Wang
Pinzhi Huang
Jihan Yang
Saining Xie
Daisuke Kawahara
75
0
0
21 May 2025
Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation
Jiankun Zhang
Shenglai Zeng
Jie Ren
Tianqi Zheng
Hui Liu
Xianfeng Tang
Hui Liu
Yi Chang
64
0
0
20 May 2025
Debating for Better Reasoning: An Unsupervised Multimodal Approach
Ashutosh Adhikari
Mirella Lapata
LRM
49
0
0
20 May 2025
VoQA: Visual-only Question Answering
Luyang Jiang
Jianing An
Jie Luo
Wenjun Wu
Lei Huang
LRM
101
0
0
20 May 2025
TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks
Yuanze Hu
Zhaoxin Fan
Xinyu Wang
Gen Li
Ye Qiu
...
Wenjun Wu
Kejian Wu
Yifan Sun
Xiaotie Deng
Jin Song Dong
62
0
0
19 May 2025
ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models
Matteo Merler
Nicola Dainese
Minttu Alakuijala
Giovanni Bonetta
Pietro Ferrazzi
Yu Tian
Bernardo Magnini
Pekka Marttinen
LM&Ro
VLM
117
0
0
19 May 2025
IQBench: How "Smart'' Are Vision-Language Models? A Study with Human IQ Tests
Tan-Hanh Pham
Phu-Vinh Nguyen
Dang The Hung
Bui Trong Duong
Vu Nguyen Thanh
Chris Ngo
Tri Quang Truong
Truong-Son Hy
ReLM
CoGe
VLM
LRM
64
0
0
17 May 2025
Diverging Towards Hallucination: Detection of Failures in Vision-Language Models via Multi-token Aggregation
Geigh Zollicoffer
Minh Vu
Manish Bhattarai
VLM
90
0
0
16 May 2025
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
Pengju Xu
Yan Wang
Shuyuan Zhang
Xuan Zhou
Xin Li
...
Fengzhao Li
Shuigeng Zhou
Xingyu Wang
Yi Zhang
Haiying Zhao
VLM
131
1
0
16 May 2025
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
74
0
0
16 May 2025
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution
Junyi Yuan
Jian Zhang
Fangyu Wu
Dongming Lu
Huanda Lu
Qiufeng Wang
93
0
0
16 May 2025
Variational Visual Question Answering
Tobias Jan Wieczorek
Nathalie Daun
Mohammad Emtiyaz Khan
Marcus Rohrbach
OOD
92
0
0
14 May 2025
Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction
Yanbin Wei
Xuehao Wang
Zhan Zhuang
Yang Chen
Shuhao Chen
Yulong Zhang
Yu Zhang
James T. Kwok
81
1
0
13 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRL
LRM
84
11
0
13 May 2025
Johnny: Structuring Representation Space to Enhance Machine Abstract Reasoning Ability
Ruizhuo Song
Beiming Yuan
26
0
0
13 May 2025
Explainable AI the Latest Advancements and New Trends
Bowen Long
Enjie Liu
Renxi Qiu
Yanqing Duan
XAI
159
0
0
11 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
82
0
0
11 May 2025
Multi-modal Synthetic Data Training and Model Collapse: Insights from VLMs and Diffusion Models
Zizhao Hu
Mohammad Rostami
Jesse Thomason
VLM
72
2
0
10 May 2025
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding
Dawei Huang
Qing Li
Chuan Yan
Zebang Cheng
Jiaming Ji
Xiang Li
Yangqiu Song
Xiaobei Wang
Zheng Lian
Xiaojiang Peng
65
1
0
10 May 2025
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
Wei Yang
Jingjing Fu
Rongpin Wang
Jinyu Wang
Lei Song
Jiang Bian
61
1
0
10 May 2025
What Do People Want to Know About Artificial Intelligence (AI)? The Importance of Answering End-User Questions to Explain Autonomous Vehicle (AV) Decisions
Somayeh Molaei
Lionel P. Robert
Nikola Banovic
56
0
0
09 May 2025
Multi-Agent System for Comprehensive Soccer Understanding
Jiayuan Rao
Zhiyu Li
Haoning Wu
Yize Zhang
Yanfeng Wang
Weidi Xie
LLMAG
95
1
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
303
1
0
05 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
135
0
0
04 May 2025
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Dongxing Yu
127
0
0
03 May 2025
An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
Modesto Castrillón-Santana
Oliverio J. Santana
David Freire-Obregón
Daniel Hernández-Sosa
J. Lorenzo-Navarro
136
0
0
30 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
262
1
0
28 Apr 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Yonatan Bitton
Brian Gordon
Michal Sokolik
Nitzan Bitton-Guetta
Almog Gueta
Royi Rassin
Itay Laish
Dani Lischinski
EGVM
VGen
110
0
0
24 Apr 2025
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Zhikai Wang
Jiashuo Sun
Weinan Zhang
Zhiqiang Hu
Xin Li
F. Wang
Deli Zhao
VLM
LRM
204
1
0
24 Apr 2025
FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
Hariseetharam Gunduboina
Muhammad Haris Khan
Biplab Banerjee
VLM
94
0
0
23 Apr 2025
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
Atin Pothiraj
Elias Stengel-Eskin
Jaemin Cho
Joey Tianyi Zhou
127
3
0
21 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
104
8
0
20 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
81
2
0
19 Apr 2025
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
Volkan Cevher
AAML
170
2
0
17 Apr 2025
ChartQA-X: Generating Explanations for Charts
Shamanthak Hegde
Pooyan Fazli
H. Seifi
107
0
0
17 Apr 2025
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Shravan Chaudhari
Trilokya Akula
Yoon Kim
Tom Blake
LRM
87
0
0
16 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Mengli Zhu
Weidong Zhang
...
Pengfei Li
Chong Li
Junhao Zhu
Hao Yang
Shiliang Sun
114
2
0
16 Apr 2025
Benchmarking Vision Language Models on German Factual Data
René Peinl
Vincent Tischler
CoGe
170
1
0
15 Apr 2025
GraphicBench: A Planning Benchmark for Graphic Design with Language Agents
Dayeon Ki
Dinesh Manocha
Marine Carpuat
Gang Wu
Puneet Mathur
Viswanathan Swaminathan
LLMAG
LM&Ro
85
0
0
15 Apr 2025
Towards Spatially-Aware and Optimally Faithful Concept-Based Explanations
Shubham Kumar
Dwip Dalal
Narendra Ahuja
85
0
0
15 Apr 2025
Previous
1
2
3
4
5
...
58
59
60
Next