ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 1,959 papers shown
Title
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
78
0
0
01 Apr 2025
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
Shuai Li
Jian Xu
Xiao-Hui Li
Chao Deng
Lin-Lin Huang
MQ
43
0
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
69
2
0
01 Apr 2025
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang
H. Wang
Mingshuo Chen
Di Wang
Yulin Wang
...
L. Lan
Wenjing Yang
Jun Zhang
Zhiyuan Liu
Maosong Sun
63
3
0
31 Mar 2025
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin
Yannic Neuhaus
Matthias Hein
VLM
55
1
0
30 Mar 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
46
0
0
30 Mar 2025
InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding
InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding
Anastasiia Fadeeva
Vincent Coriou
Diego Antognini
C. Musat
Andrii Maksai
49
0
0
29 Mar 2025
Patronus: Bringing Transparency to Diffusion Models with Prototypes
Patronus: Bringing Transparency to Diffusion Models with Prototypes
Nina Weng
Aasa Feragen
Siavash Bigdeli
DiffM
41
0
0
28 Mar 2025
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Antonia Karamolegkou
Malvina Nikandrou
Georgios Pantazopoulos
Danae Sanchez Villegas
Phillip Rust
Ruchira Dhar
Daniel Hershcovich
Anders Søgaard
39
0
0
28 Mar 2025
Learning to Instruct for Visual Instruction Tuning
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou
Feng Hong
Jiaan Luo
Jiangchao Yao
Dongsheng Li
Bo Han
Yuyao Zhang
Yanfeng Wang
VLM
66
0
0
28 Mar 2025
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
Dongchen Lu
Yuyao Sun
Zilu Zhang
Leping Huang
Jianliang Zeng
Mao Shu
Huo Cao
44
0
0
27 Mar 2025
Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck
Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck
Adrian Bulat
Yassine Ouali
Georgios Tzimiropoulos
181
0
0
27 Mar 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCL
VLM
89
2
0
27 Mar 2025
UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning
UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning
Hongxuan Tang
Hao Liu
Xinyan Xiao
45
1
0
27 Mar 2025
Beyond Intermediate States: Explaining Visual Redundancy through Language
Beyond Intermediate States: Explaining Visual Redundancy through Language
Dingchen Yang
Bowen Cao
Anran Zhang
Weibo Gu
Winston Hu
Guang Chen
VLM
79
0
0
26 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
95
0
0
26 Mar 2025
Vision as LoRA
Vision as LoRA
Han Wang
Yongjie Ye
Bingru Li
Yuxiang Nie
Jinghui Lu
Jingqun Tang
Yanjie Wang
Can Huang
88
1
0
26 Mar 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang
Yue Liao
Kang Rong
Fengyun Rao
Yibo Yang
Si Liu
75
0
0
26 Mar 2025
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Hao Ai
Kunyi Wang
Zezhou Wang
H. Lu
Jin Tian
Yaxin Luo
Peng-Fei Xing
Jen-Yuan Huang
Huaxia Li
Gen Luo
MLLM
VLM
110
0
0
26 Mar 2025
Gemma 3 Technical Report
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
93
41
0
25 Mar 2025
Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions
Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions
Dong Jing
Nanyi Fei
Zhiwu Lu
51
0
0
24 Mar 2025
MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering
MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering
Shuo Yang
Siwen Luo
S. Han
Eduard Hovy
LRM
44
0
0
24 Mar 2025
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao
Wei Chen
Qiang Qiu
92
1
0
24 Mar 2025
On the Perception Bottleneck of VLMs for Chart Understanding
On the Perception Bottleneck of VLMs for Chart Understanding
Junteng Liu
Weihao Zeng
Xiwen Zhang
Yijun Wang
Zifei Shan
Junxian He
65
0
0
24 Mar 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
42
5
0
23 Mar 2025
good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval
good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval
Pranavi Kolouju
Eric Xing
Robert Pless
Nathan Jacobs
Abby Stylianou
3DV
58
0
0
22 Mar 2025
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
Jianing Qi
Jiawei Liu
Hao Tang
Zhigang Zhu
106
1
0
21 Mar 2025
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Zenghui Yuan
Jiawen Shi
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
AAML
68
1
0
20 Mar 2025
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Shuo Li
Jiajun Sun
Guodong Zheng
Xiaoran Fan
Yujiong Shen
...
Wenming Tan
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
VLM
90
1
0
19 Mar 2025
Vision-Speech Models: Teaching Speech Models to Converse about Images
Vision-Speech Models: Teaching Speech Models to Converse about Images
Amélie Royer
Moritz Böhle
Gabriel de Marmiesse
Laurent Mazaré
Neil Zeghidour
Alexandre Défossez
P. Pérez
AuLLM
VLM
86
0
0
19 Mar 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang
Chenghui Lv
Xian Li
Shichao Dong
Huadong Li
Kelu Yao
Chao Li
Wenqi Shao
Ping Luo
62
1
0
19 Mar 2025
Survey of Adversarial Robustness in Multimodal Large Language Models
Survey of Adversarial Robustness in Multimodal Large Language Models
Chengze Jiang
Zhuangzhuang Wang
Minjing Dong
Jie Gui
AAML
63
0
0
18 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
63
0
0
18 Mar 2025
Where do Large Vision-Language Models Look at when Answering Questions?
Where do Large Vision-Language Models Look at when Answering Questions?
X. Xing
Chia-Wen Kuo
Li Fuxin
Yulei Niu
Fan Chen
Ming Li
Ying Wu
Longyin Wen
Sijie Zhu
LRM
62
0
0
18 Mar 2025
Growing a Twig to Accelerate Large Vision-Language Models
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao
Mingyang Wang
Zhou Yu
Wenwen Pan
Yan Yang
Tao Wei
Hao Zhang
Ning Mao
Wei Chen
Jun Yu
VLM
67
1
0
18 Mar 2025
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Wei Song
Yansen Wang
Zijia Song
Yadong Li
Haoze Sun
Xin Wu
Zenan Zhou
Jianhua Xu
Jiaqi Wang
Kaicheng Yu
60
2
0
18 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu-Xi Cheng
VLM
62
1
0
17 Mar 2025
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
Haiyang Guo
Fanhu Zeng
Ziwei Xiang
Fei Zhu
Da-Han Wang
Xu-Yao Zhang
Cheng-Lin Liu
51
1
0
17 Mar 2025
Grounded Chain-of-Thought for Multimodal Large Language Models
Grounded Chain-of-Thought for Multimodal Large Language Models
Qiong Wu
Xiangcong Yang
Yiyi Zhou
Chenxin Fang
Baiyang Song
Xiaoshuai Sun
Rongrong Ji
LRM
86
1
0
17 Mar 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Hao Yin
Guangzong Si
Zilei Wang
53
0
0
17 Mar 2025
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
Hao Yin
Guangzong Si
Zilei Wang
160
0
0
17 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
61
0
0
17 Mar 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
72
0
0
17 Mar 2025
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning
Yiwei Chen
Yuguang Yao
Yihua Zhang
Bingquan Shen
Gaowen Liu
Sijia Liu
AAML
MU
63
1
0
14 Mar 2025
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity
Jing Bi
Junjia Guo
Susan Liang
Guangyu Sun
Luchuan Song
...
Jinxi He
Jiarui Wu
A. Vosoughi
Chong Chen
Chenliang Xu
LRM
74
2
0
14 Mar 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
49
1
0
13 Mar 2025
ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
Pengfei Luo
Jingbo Zhou
Tong Xu
Yuan Xia
Linli Xu
Enhong Chen
LRM
67
0
0
13 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
68
11
0
13 Mar 2025
PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models
Zilu Guo
Hongbin Lin
Zhihao Yuan
C. Zheng
Pengshuo Qiu
Dongzhi Jiang
Renrui Zhang
Chun-Mei Feng
Zhen Li
MLLM
3DV
87
1
0
13 Mar 2025
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
Xudong Tan
Peng Ye
Chongjun Tu
Jianjian Cao
Yaoxin Yang
Lin Zhang
Dongzhan Zhou
Tao Chen
VLM
56
0
0
13 Mar 2025
Previous
12345...383940
Next