ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.20330
  4. Cited By
Are We on the Right Way for Evaluating Large Vision-Language Models?

Are We on the Right Way for Evaluating Large Vision-Language Models?

29 March 2024
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Zehui Chen
Haodong Duan
Jiaqi Wang
Yu Qiao
Dahua Lin
Feng Zhao
    VLM
ArXivPDFHTML

Papers citing "Are We on the Right Way for Evaluating Large Vision-Language Models?"

50 / 189 papers shown
Title
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Pengfei Wang
Guohai Xu
Weinong Wang
Junjie Yang
Jie Lou
Yunhua Xue
24
0
0
15 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
30
0
0
14 May 2025
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Yiran Chen
Hao Peng
Tong Zhang
Heng Ji
VLM
28
0
0
13 May 2025
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
Takamitsu Omasa
Ryo Koshihara
Masumi Morishige
24
0
0
09 May 2025
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu
Sheng Zhang
Guanghui Qin
Timothy Ossowski
Yu Gu
...
Sam Preston
Mu-Hsin Wei
Paul Vozila
Tristan Naumann
Hoifung Poon
OOD
LRM
VLM
59
1
0
06 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
79
1
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Xindi Wu
Hee Seung Hwang
Polina Kirichenko
Olga Russakovsky
VLM
CoGe
68
0
0
30 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
ZipR1: Reinforcing Token Sparsity in MLLMs
ZipR1: Reinforcing Token Sparsity in MLLMs
Feng Chen
Yefei He
Lequan Lin
Jiaheng Liu
Bohan Zhuang
Qi Wu
46
0
0
23 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
62
0
0
20 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
28
2
0
14 Apr 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
Jianfei Chen
Jingwei Xu
Bin Cui
Conghui He
Wentao Zhang
MLLM
59
0
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
70
15
1
14 Apr 2025
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Yuyang Ji
Haohan Wang
LRM
39
0
0
14 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
X. Li
Zilong Huang
Y. Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLM
LRM
62
2
0
14 Apr 2025
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
M. Dhouib
Davide Buscaldi
Sonia Vanier
A. Shabou
VLM
36
1
0
11 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Yong-Jin Liu
Qi Wang
Fuzheng Zhang
VLM
63
1
0
10 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Yong-Jin Liu
Qi Wang
Fuzheng Zhang
MLLM
VLM
58
0
0
10 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Qing Guo
Z. Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
VLM
LRM
69
1
0
10 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLM
OffRL
LRM
40
3
0
10 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Y. Cao
Dahua Lin
Jiaqi Wang
OffRL
60
1
0
10 Apr 2025
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
Yukun Qi
Yiming Zhao
Y. Zeng
Xikun Bao
Yifan Jiang
Lin Yen-Chen
Zehui Chen
Jie Zhao
Zhongang Qi
Feng Zhao
LRM
49
0
0
10 Apr 2025
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
Yijun Liang
Ming Li
Chenrui Fan
Ziyue Li
Dang Nguyen
Kwesi Cobbina
Shweta Bhardwaj
Jiuhai Chen
Fuxiao Liu
Tianyi Zhou
VLM
CoGe
53
0
0
10 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Zhengzhang Chen
Zongyu Lin
MLLM
VLM
MoE
207
4
0
10 Apr 2025
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Wei Chen
Xin Yan
Bin Wen
Fan Yang
Tingting Gao
Di Zhang
Long Chen
MLLM
97
0
0
09 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
39
7
0
07 Apr 2025
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
Yang Jiao
Haibo Qiu
Zequn Jie
S. Chen
Jingjing Chen
Lin Ma
Yu Jiang
34
2
0
06 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
75
0
0
01 Apr 2025
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
Weizhi Wang
Yu Tian
L. Yang
Heng Wang
Xifeng Yan
MLLM
VLM
79
0
0
01 Apr 2025
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Francesco P. Ramunno
Paolo Massa
Vitaliy Kinakh
Brandon Panos
A. Csillaghy
S. Voloshynovskiy
DiffM
53
0
0
31 Mar 2025
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang
Long Chan
Jinlong Liu
Wanggui He
Hao Jiang
Mingli Song
Jingyuan Chen
Chang Yao
Jie Song
LRM
32
0
0
31 Mar 2025
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language
Yoonshik Kim
Jaeyoon Jung
37
0
0
31 Mar 2025
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang
H. Wang
Mingshuo Chen
Di Wang
Yulin Wang
...
L. Lan
Wenjing Yang
Jingyang Zhang
Zhiyuan Liu
Maosong Sun
63
3
0
31 Mar 2025
Learning to Instruct for Visual Instruction Tuning
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou
Feng Hong
Jiaan Luo
Jiangchao Yao
Dongsheng Li
Bo Han
Yuyao Zhang
Yanfeng Wang
VLM
66
0
0
28 Mar 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
52
0
0
28 Mar 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Yixuan Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
84
3
0
28 Mar 2025
Qwen2.5-Omni Technical Report
Qwen2.5-Omni Technical Report
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
...
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
90
16
0
26 Mar 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Yichao Yan
VLM
42
1
0
26 Mar 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang
Yue Liao
Kang Rong
Fengyun Rao
Yibo Yang
Si Liu
75
0
0
26 Mar 2025
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
Dawei Yan
Yangfu Li
Qing-Guo Chen
Weihua Luo
Peng Wang
H. Zhang
Chunhua Shen
VGen
VLM
LRM
72
0
0
24 Mar 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
42
5
0
23 Mar 2025
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Jie M. Zhang
Zheng Yuan
Ziyi Wang
Bei Yan
Sibo Wang
Xiangkui Cao
Zonghui Guo
Shiguang Shan
Xilin Chen
ELM
47
0
0
20 Mar 2025
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
Eduard Allakhverdov
Elizaveta Goncharova
Andrey Kuznetsov
42
0
0
20 Mar 2025
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang
Munan Ning
Zheyuan Liu
Yanbo Wang
Jiayi Ye
Yue Huang
Shuo Yang
Xiao Chen
Y. Song
Li Yuan
LRM
58
0
0
19 Mar 2025
Where do Large Vision-Language Models Look at when Answering Questions?
Where do Large Vision-Language Models Look at when Answering Questions?
X. Xing
Chia-Wen Kuo
Li Fuxin
Yulei Niu
Fan Chen
Ming Li
Ying Wu
Longyin Wen
Sijie Zhu
LRM
62
0
0
18 Mar 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Xinyu Fang
Z. Chen
Kai Lan
Lixin Ma
Shengyuan Ding
...
Zicheng Zhang
Guofeng Zhang
Haodong Duan
K. Chen
Dahua Lin
MLLM
66
1
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yuyao Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
Tieniu Tan
167
2
0
18 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
H. Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
62
18
0
17 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Yuqing Yang
Afshin Dehghan
Peter Grasch
74
3
0
17 Mar 2025
1234
Next