ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
Xiaolong Wang
Zhaolu Kang
Wangyuxuan Zhai
Xinyue Lou
Yunghwei Lai
...
Yawen Wang
Kaiyu Huang
Yile Wang
Peng Li
Yang Liu
14
0
0
20 Jun 2025
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
Shoubin Yu
Yue Zhang
Ziyang Wang
Jaehong Yoon
Mohit Bansal
MoELRM
10
0
0
20 Jun 2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
Tongtian Yue
Longteng Guo
Yepeng Tang
Zijia Zhao
Xinxin Zhu
Hua Huang
Jing Liu
MLLMVLM
16
0
0
20 Jun 2025
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
Fenghua Cheng
Jinxiang Wang
Sen Wang
Zi Huang
Xue Li
LRM
19
0
0
19 Jun 2025
Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models
Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models
Xuelin Shen
Jiayin Xu
Kangsheng Yin
Wenhan Yang
AAML
19
0
0
18 Jun 2025
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
Yujun Wang
Jinhe Bi
Yunpu Ma
Soeren Pirk
MLLM
46
0
0
17 Jun 2025
FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design
FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design
Kai Lan
Jiayong Zhu
Jiangtong Li
Dawei Cheng
Guang-Sheng Chen
Changjun Jiang
LRM
22
0
0
16 Jun 2025
CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation
CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation
Naihao Deng
Kapotaksha Das
Rada Mihalcea
Vitaliy Popov
M. Abouelenien
14
0
0
15 Jun 2025
MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space
MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space
Anshul Singh
Chris Biemann
Jan Strich
LMTDLRM
19
0
0
13 Jun 2025
On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval
On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval
Seongbo Jang
Seonghyeon Lee
Dongha Lee
Hwanjo Yu
19
0
0
13 Jun 2025
EasyARC: Evaluating Vision Language Models on True Visual Reasoning
EasyARC: Evaluating Vision Language Models on True Visual Reasoning
Mert Unsal
Aylin Akkus
VLMLRM
19
0
0
13 Jun 2025
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang
Mengzhen Liu
Lichen Li
Ming Lu
Yuan Zhang
Junwen Pan
Qi She
Shanghang Zhang
VLM
117
0
0
12 Jun 2025
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
Y. Zhang
Hewei Gao
Haokun Chen
Weiguo Li
Yunpu Ma
Volker Tresp
17
0
0
12 Jun 2025
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
Xiyao Wang
Zhengyuan Yang
Chao Feng
Yongyuan Liang
Yuhang Zhou
...
Chung-Ching Lin
Kevin Lin
Linjie Li
Furong Huang
L. xilinx Wang
OffRLLRM
62
0
0
11 Jun 2025
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs
Beomsik Cho
Jaehyung Kim
64
0
0
11 Jun 2025
Vision Generalist Model: A Survey
Vision Generalist Model: A Survey
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
65
0
0
11 Jun 2025
Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos
Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos
Benjamin Z. Reichman
Constantin Patsch
Jack Truxal
Atishay Jain
Larry Heck
43
0
0
11 Jun 2025
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Benno Krojer
Mojtaba Komeili
Candace Ross
Q. Garrido
Koustuv Sinha
Nicolas Ballas
Mahmoud Assran
66
1
0
11 Jun 2025
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta
A. Roy
Rama Chellappa
Nathaniel D. Bastian
Alvaro Velasquez
Susmit Jha
58
0
0
11 Jun 2025
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang
Zhenhua Liu
Jiaheng Wei
Xuanwu Yin
Dong Li
E. Barsoum
LRM
80
0
0
11 Jun 2025
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
Leqi Shen
Guoqiang Gong
Tianxiang Hao
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Jungong Han
Guiguang Ding
24
0
0
10 Jun 2025
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
Aniket Rege
Zinnia Nie
Mahesh Ramesh
Unmesh Raskar
Zhuoran Yu
Aditya Kusupati
Yong Jae Lee
Ramya Korlakai Vinayak
26
0
0
09 Jun 2025
ZeroVO: Visual Odometry with Minimal Assumptions
ZeroVO: Visual Odometry with Minimal Assumptions
Lei Lai
Zekai Yin
Eshed Ohn-Bar
VGen
28
0
0
09 Jun 2025
Event-Priori-Based Vision-Language Model for Efficient Visual Understanding
Event-Priori-Based Vision-Language Model for Efficient Visual Understanding
Haotong Qin
Cheng Hu
Michele Magno
VLM
22
0
0
09 Jun 2025
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang
Yixiao Fang
Peng Xing
Shuhan Wu
Wei Cheng
Rui Wang
Xianfang Zeng
Gang Yu
H. Chen
EGVMVLM
30
0
0
09 Jun 2025
Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
Arnau Igualde Sáez
Lamyae Rhomrasi
Yusef Ahsini
Ricardo Vinuesa
S. Hoyas
Jose P. García Sabater
Marius J. Fullana i Alfonso
J. Alberto Conejero
LRM
10
0
0
09 Jun 2025
Language-Vision Planner and Executor for Text-to-Visual Reasoning
Language-Vision Planner and Executor for Text-to-Visual Reasoning
Yichang Xu
Gaowen Liu
Ramana Rao Kompella
Sihao Hu
Tiansheng Huang
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Ling Liu
LRMVLM
23
0
0
09 Jun 2025
Mitigating Object Hallucination via Robust Local Perception Search
Mitigating Object Hallucination via Robust Local Perception Search
Zixian Gao
Chao Yang
Zhanhui Zhou
Xing Xu
Chaochao Lu
MLLM
15
0
0
07 Jun 2025
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Akash Gupta
Amos Storkey
Mirella Lapata
VLM
43
0
0
07 Jun 2025
A Systematic Review of Poisoning Attacks Against Large Language Models
A Systematic Review of Poisoning Attacks Against Large Language Models
Neil Fendley
Edward W. Staley
Joshua Carney
William Redman
Marie Chau
Nathan G. Drenkow
AAMLPILM
23
0
0
06 Jun 2025
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Z. Babaiee
Peyman M. Kiasari
Daniela Rus
Radu Grosu
45
0
0
06 Jun 2025
TextVidBench: A Benchmark for Long Video Scene Text Understanding
Yangyang Zhong
Ji Qi
Yuan Yao
Pengxin Luo
Yunfeng Yan
Donglian Qi
Zhiyuan Liu
Tat-Seng Chua
99
0
0
05 Jun 2025
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
Linjie Li
Mahtab Bigverdi
Jiawei Gu
Zixian Ma
Yinuo Yang
Ziang Li
Yejin Choi
Ranjay Krishna
LRM
82
0
0
05 Jun 2025
ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling
ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling
Hernán Maina
Guido Ivetta
Mateo Lione Stuto
Julian Martin Eisenschlos
Jorge Sánchez
Luciana Benotti
69
0
0
04 Jun 2025
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Xiaojun Shan
Qi Cao
Xing Han
Haofei Yu
Paul Liang
51
0
0
02 Jun 2025
Learning Sparsity for Effective and Efficient Music Performance Question Answering
Learning Sparsity for Effective and Efficient Music Performance Question Answering
Xingjian Diao
Tianzhen Yang
Chunhui Zhang
Weiyi Wu
Ming Cheng
Jiang Gui
70
1
0
02 Jun 2025
From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models
From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models
Yihong Tang
Ao Qu
Xujing Yu
Weipeng Deng
Jun Ma
Jinhua Zhao
Lijun Sun
47
0
0
02 Jun 2025
Improve MLLM Benchmark Efficiency through Interview
Improve MLLM Benchmark Efficiency through Interview
Farong Wen
Yijin Guo
Junying Wang
Jiaohao Xiao
Yingjie Zhou
Chunyi Li
Zicheng Zhang
Guangtao Zhai
MLLM
36
0
0
01 Jun 2025
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs
X. Zhu
Ziheng Jia
Jiarui Wang
Xiangyu Zhao
Haodong Duan
Xiongkuo Min
Jia Wang
Zicheng Zhang
Guangtao Zhai
EGVMVLM
49
0
0
01 Jun 2025
What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning
What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning
Zhaotian Weng
Haoxuan Li
Kuan-Hao Huang
Jieyu Zhao
LRMCoGe
32
0
0
01 Jun 2025
The Security Threat of Compressed Projectors in Large Vision-Language Models
The Security Threat of Compressed Projectors in Large Vision-Language Models
Yudong Zhang
Ruobing Xie
Xingwu Sun
Jiansheng Chen
Zhanhui Kang
Di Wang
Yu Wang
16
0
0
31 May 2025
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Yuwen Tan
Yuan Qing
Boqing Gong
41
0
0
30 May 2025
Multi-Sourced Compositional Generalization in Visual Question Answering
Multi-Sourced Compositional Generalization in Visual Question Answering
Chuanhao Li
Wenbo Ye
Zhen Li
Yuwei Wu
Yunde Jia
CoGe
63
0
0
29 May 2025
Spoken question answering for visual queries
Spoken question answering for visual queries
Nimrod Shabtay
Zvi Kons
Avihu Dekel
Hagai Aronowitz
R. Hoory
Assaf Arbelle
63
0
0
29 May 2025
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Qingyu Shi
Jinbin Bai
Zhuoran Zhao
Wenhao Chai
Kaidong Yu
...
Shuangyong Song
Yunhai Tong
Xiangtai Li
X. Li
Shuicheng Yan
87
2
0
29 May 2025
IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth
IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth
Md Touhidul Islam
Imran Kabir
Md. Alimoor Reza
Syed Masum Billah
45
0
0
28 May 2025
Flexible Tool Selection through Low-dimensional Attribute Alignment of Vision and Language
Flexible Tool Selection through Low-dimensional Attribute Alignment of Vision and Language
Guangfu Hao
Haojie Wen
Liangxuna Guo
Yang Chen
Yanchao Bi
S. Yu
62
0
0
28 May 2025
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs
Zhehan Kan
Y. Liu
Kun Yin
Xinghua Jiang
Xin Li
...
Yinsong Liu
D. Jiang
Xing Sun
Qingmin Liao
Wenming Yang
LRM
87
0
0
27 May 2025
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
Hyunsik Chae
Seungwoo Yoon
J. Park
Chloe Yewon Chun
Yongin Cho
Mu Cai
Yong Jae Lee
Ernest K. Ryu
CoGeVLM
52
3
0
26 May 2025
Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models
Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models
Xinmiao Hu
C. Wang
Ruihe An
ChenYu Shao
Xiaojun Ye
Sheng Zhou
Liangcheng Li
MLLMLRM
55
0
0
26 May 2025
1234...585960
Next