ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
Pengfei Luo
Jingbo Zhou
Tong Xu
Yuan Xia
Linli Xu
Enhong Chen
LRM
151
0
0
13 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
Junlong Li
Xiang Yue
Bo Li
Ping Nie
Dayou Du
Wenhu Chen
LRM
166
4
0
13 Mar 2025
Teaching LMMs for Image Quality Scoring and Interpreting
Zicheng Zhang
H. Wu
Ziheng Jia
Weisi Lin
Guangtao Zhai
129
2
0
12 Mar 2025
FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification
S. Sami
Md Golam Moula Mehedi Hasan
Nasser M. Nasrabadi
Raghuveer Rao
124
0
0
12 Mar 2025
EgoBlind: Towards Egocentric Visual Assistance for the Blind
EgoBlind: Towards Egocentric Visual Assistance for the Blind
Junbin Xiao
Nanxin Huang
Hao Qiu
Zhulin Tao
Xun Yang
Richang Hong
Ming Wang
Angela Yao
EgoVVLM
130
0
0
11 Mar 2025
From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics
Jaewook Lee
Jeongah Lee
Wanyong Feng
Andrew Lan
97
0
0
10 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRMReLM
242
88
0
10 Mar 2025
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
102
0
0
10 Mar 2025
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko
Tianyi Chen
Sungnyun Kim
Tianyu Ding
Luming Liang
Ilya Zharkov
Se-Young Yun
VLM
458
2
0
10 Mar 2025
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
Xiang Liu
Zhaoxiang Liu
Huan Hu
Zezhou Chen
Kohou Wang
Ning Wang
Kai Wang
88
1
0
10 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLMObjD
97
0
0
10 Mar 2025
Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction
Zongzheng Zhang
Xinrun Li
Sizhe Zou
Guoxuan Chi
Siqi Li
...
Guoliang Wang
Guantian Zheng
Leichen Wang
Hang Zhao
Hao Zhao
145
0
0
10 Mar 2025
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru
Dunant Cusipuma
David Ortega
Victor Flores-Benites
Arturo Deza
OOD
159
0
0
10 Mar 2025
Small Vision-Language Models: A Survey on Compact Architectures and Techniques
Nitesh Patnaik
Navdeep Nayak
Himani Bansal Agrawal
Moinak Chinmoy Khamaru
Gourav Bal
Saishree Smaranika Panda
Rishi Raj
Vishal Meena
Kartheek Vadlamani
VLM
97
0
0
09 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MUOffRLLRMMLLMReLMVLM
167
130
0
09 Mar 2025
Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving
Enming Zhang
Peizhe Gong
Xingyuan Dai
Yisheng Lv
Qinghai Miao
MLLMELM
109
2
0
09 Mar 2025
Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
Md Azim Khan
A. Gangopadhyay
Jianwu Wang
Robert F. Erbacher
VLM
78
0
0
08 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Xin Jin
Hui Su
Jinlan Fu
Xiaoyu Shen
101
0
0
08 Mar 2025
Automatic Teaching Platform on Vision Language Retrieval Augmented Generation
Automatic Teaching Platform on Vision Language Retrieval Augmented Generation
Ruslan Gokhman
Jialu Li
Youshan Zhang
VLM
123
0
0
07 Mar 2025
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning
Zhong Ji
Weilong Cao
Yan Zhang
Yanwei Pang
Jungong Han
Xuelong Li
DiffMVLM
88
0
0
06 Mar 2025
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Sunghyun Ahn
Youngwan Jo
Kijung Lee
Sein Kwon
Inpyo Hong
Sanghyun Park
114
1
0
06 Mar 2025
Task-Agnostic Attacks Against Vision Foundation Models
Brian Pulfer
Yury Belousov
Vitaliy Kinakh
Teddy Furon
S. Voloshynovskiy
AAML
111
0
0
05 Mar 2025
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
Jiayi Zhao
Fei Teng
Kai Luo
Guoqiang Zhao
Zehan Li
Xu Zheng
Kailun Yang
VLM
127
7
0
04 Mar 2025
Are Large Vision Language Models Good Game Players?
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLMELMLRM
155
8
0
04 Mar 2025
OWLViz: An Open-World Benchmark for Visual Question Answering
OWLViz: An Open-World Benchmark for Visual Question Answering
T. Nguyen
Dang Nguyen
Hoang Nguyen
Thuan Luong
Long Hoang Dang
Viet Dac Lai
VLM
97
0
0
04 Mar 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Shiqi Chen
Tongyao Zhu
Ruochen Zhou
Jinghan Zhang
Siyang Gao
Juan Carlos Niebles
Mor Geva
Junxian He
Jiajun Wu
Manling Li
LRM
101
3
0
03 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
421
0
0
02 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
187
15
0
28 Feb 2025
FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts
FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts
Ziyi Zhang
Zhen Sun
Zheng Zhang
Jihui Guo
Xinlei He
AAML
139
4
0
28 Feb 2025
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow
Jiaqi Bai
Hongcheng Guo
Zhongyuan Peng
Jian Yang
Zhiyu Li
Mingze Li
Zhihong Tian
VLM
97
2
0
28 Feb 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Zhenyu Liu
Yunxin Li
Baotian Hu
Wenhan Luo
Yaowei Wang
Min Zhang
108
0
0
27 Feb 2025
Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation
Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation
Xin Liu
Zheng Zhang
Jingxin Nie
82
0
0
26 Feb 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
89
1
0
26 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
110
0
0
26 Feb 2025
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Nilay Yilmaz
Maitreya Patel
Yiran Luo
Tejas Gokhale
Chitta Baral
Suren Jayasuriya
Yezhou Yang
LRM
106
0
0
25 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
130
1
0
25 Feb 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
203
2
0
25 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
177
7
0
25 Feb 2025
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
Qiuchen Wang
Ruixue Ding
Zehui Chen
Weiqi Wu
Shihang Wang
Pengjun Xie
Feng Zhao
111
2
0
25 Feb 2025
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
Jiahao Nick Li
Zhuohao Jerry Zhang
Zhang
180
2
0
24 Feb 2025
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
Davide Testa
Giovanni Bonetta
Raffaella Bernardi
Alessandro Bondielli
Alessandro Lenci
Alessio Miaschi
Lucia Passaro
Bernardo Magnini
VGenLRM
90
0
0
24 Feb 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
119
0
0
24 Feb 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Nidhal Jegham
Marwan Abdelatti
Abdeltawab Hendawi
VLMLRM
99
3
0
23 Feb 2025
Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
Qipan Xu
Ziyi Wang
Xiaoxiao He
Ligong Han
Ruixiang Tang
57
1
0
23 Feb 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
Xinwei Long
Zhiyuan Ma
Ermo Hua
Kaiyan Zhang
Biqing Qi
Bowen Zhou
RALM
126
1
0
23 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
165
9
0
21 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
102
9
0
21 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Zehan Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Yansen Wang
81
0
0
19 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
246
2
0
18 Feb 2025
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
Weikang Qiu
Zheng Huang
Haoyu Hu
Aosong Feng
Yujun Yan
Rex Ying
97
0
0
18 Feb 2025
Previous
12345...585960
Next