ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.11691
  4. Cited By
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
v1v2v3 (latest)

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

16 July 2024
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
Yuan Liu
Xiao-wen Dong
Yuhang Zang
Pan Zhang
Jiaqi Wang
Yubo Ma
Kai Chen
Yifan Zhang
Shiyin Lu
Tack Hwa Wong
Weiyun Wang
Peiheng Zhou
Xiaozhe Li
Chaoyou Fu
Junbo Cui
Xiaoyi Dong
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
    LM&MAVLM
ArXiv (abs)PDFHTML

Papers citing "VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models"

50 / 209 papers shown
Title
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
Yizhen Zhang
Yang Ding
Shuoshuo Zhang
Xinchen Zhang
Haoling Li
...
Jie Wu
Lei Ji
Yelong Shen
Y. Yang
Yeyun Gong
OffRLVLMLRM
24
0
0
17 Jun 2025
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Haibo Qiu
X. Lan
Fanfan Liu
Xiaohu Sun
Delian Ruan
Peng Shi
Lin Ma
ReLMOffRLLRM
37
0
0
16 Jun 2025
Evaluating Cell Type Inference in Vision Language Models Under Varying Visual Context
Evaluating Cell Type Inference in Vision Language Models Under Varying Visual Context
Samarth Singhal
Sandeep Singhal
VLM
19
0
0
15 Jun 2025
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
René Peinl
Vincent Tischler
CoGeVLM
37
0
0
13 Jun 2025
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang
Zhenhua Liu
Jiaheng Wei
Xuanwu Yin
Dong Li
E. Barsoum
LRM
80
0
0
11 Jun 2025
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Zheqi He
Yesheng Liu
Jing-shu Zheng
Xuejing Li
Richeng Xuan
Jin-Ge Yao
Xi Yang
Xi Yang
MLLMVLM
44
0
0
10 Jun 2025
Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models
Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models
Chengyue Huang
Yuchen Zhu
Sichen Zhu
Jingyun Xiao
Moises Andrade
Shivang Chopra
Z. Kira
ReLMVLMLRM
15
0
0
09 Jun 2025
Vision Transformers Don't Need Trained Registers
Vision Transformers Don't Need Trained Registers
Nick Jiang
Amil Dravid
Alexei A. Efros
Yossi Gandelsman
35
0
0
09 Jun 2025
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
Yuhan Hao
Zhengning Li
Lei Sun
Weilong Wang
Naixin Yi
...
Caihong Qin
Mofan Zhou
Yifei Zhan
Peng Jia
Xianpeng Lang
VLM
42
0
0
06 Jun 2025
HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios
HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios
Daming Wang
Yuhao Song
Zijian He
Kangliang Chen
Xing Pan
Lu Deng
Weihao Gu
VLMLRM
88
0
0
06 Jun 2025
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Yan Shu
Hangui Lin
Yexin Liu
Yan Zhang
Gangyan Zeng
Yan Li
Yu Zhou
Ser-Nam Lim
Harry Yang
N. Sebe
MLLMVLM
49
0
0
05 Jun 2025
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Xin Jin
Zhenguo Li
James T. Kwok
Yu Zhang
LRM
98
0
0
05 Jun 2025
Can Vision Language Models Infer Human Gaze Direction? A Controlled Study
Can Vision Language Models Infer Human Gaze Direction? A Controlled Study
Zory Zhang
Pinyuan Feng
Bingyang Wang
Tianwei Zhao
Suyang Yu
Qingying Gao
Hokin Deng
Ziqiao Ma
Yijiang Li
Dezhi Luo
20
0
0
04 Jun 2025
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
Yang Yao
Lingyu Li
Jiaxin Song
Chiyu Chen
Zhenqi He
...
Xin Wang
Tianle Gu
Jie Li
Yan Teng
Yingchun Wang
LRM
17
0
0
03 Jun 2025
Improve MLLM Benchmark Efficiency through Interview
Improve MLLM Benchmark Efficiency through Interview
Farong Wen
Yijin Guo
Junying Wang
Jiaohao Xiao
Yingjie Zhou
Chunyi Li
Zicheng Zhang
Guangtao Zhai
MLLM
28
0
0
01 Jun 2025
GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
Yufei Zhan
Ziheng Wu
Yousong Zhu
Rongkun Xue
Ruipu Luo
...
Zhentao He
Zheming Yang
Ming Tang
Minghui Qiu
Jinqiao Wang
MLLMReLMLRM
50
0
0
01 Jun 2025
Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective
Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective
Lei Lei
Jie Gu
Xiaokang Ma
Chu Tang
Jingmin Chen
Tong Xu
43
1
0
01 Jun 2025
Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation
Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation
Zeyu Liu
Zhitian Hou
Yining Di
Kejing Yang
Zhijie Sang
...
Siyuan Liu
Jialu Wang
Chunming Li
Ming Li
Hongxia Yang
LM&MALRM
15
0
0
29 May 2025
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Y. Liu
Kun Ouyang
Haoning Wu
Yi Liu
Lin Sui
Xinhao Li
Y. Zhong
Y. Charles
Xinyu Zhou
Xu Sun
VLMLRM
90
0
0
29 May 2025
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Sihan Yang
Runsen Xu
Yiman Xie
Sizhe Yang
Mo Li
...
Haodong Duan
Xiangyu Yue
Dahua Lin
Tai Wang
Jiangmiao Pang
VLMLRM
53
1
0
29 May 2025
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
Chan-wei Hu
Yueqi Wang
Shuo Xing
Chia-Ju Chen
Zhengzhong Tu
3DV
17
1
0
29 May 2025
NegVQA: Can Vision Language Models Understand Negation?
NegVQA: Can Vision Language Models Understand Negation?
Yuhui Zhang
Yuchang Su
Yiming Liu
Serena Yeung-Levy
MLLMCoGe
48
0
0
28 May 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLMLRMVLM
104
0
0
28 May 2025
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
Xuanwen Ding
Chengjun Pan
Zejun Li
Jiwen Zhang
Siyuan Wang
Zhongyu Wei
59
0
0
27 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
185
1
0
25 May 2025
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Jiwan Chung
Junhyeok Kim
Siyeol Kim
Jaeyoung Lee
Min Soo Kim
Youngjae Yu
LRM
95
0
0
24 May 2025
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
Anjie Le
Henan Liu
Yue Wang
Zhenyu Liu
Rongkun Zhu
...
Alison Noble
Jacques Souquet
Xiaoqing Guo
Manxi Lin
Hongcheng Guo
LM&MAELMVLM
68
0
0
23 May 2025
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
180
0
0
23 May 2025
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
Meng-Hao Guo
Xuanyu Chu
Qianrui Yang
Zhe-Han Mo
Yiqing Shen
...
Kiyohiro Nakayama
Zhengyang Geng
Houwen Peng
Han Hu
Shi-Min Hu
LRM
197
0
0
22 May 2025
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan
Kaituo Feng
Haoming Lyu
Dongzhan Zhou
Xiangyu Yue
ReLMLRM
131
0
0
22 May 2025
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
Huanjin Yao
Qixiang Yin
Jingyi Zhang
Min Yang
Yibo Wang
...
Fei Su
Li Shen
Minghui Qiu
Dacheng Tao
Jiaxing Huang
LRM
72
0
0
22 May 2025
OViP: Online Vision-Language Preference Learning
OViP: Online Vision-Language Preference Learning
Shujun Liu
Siyuan Wang
Zejun Li
Jianxiang Wang
Cheng Zeng
Zhongyu Wei
MLLMVLM
76
0
0
21 May 2025
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Weiming Wu
Zi-kang Wang
Jin Ye
Zhi Zhou
Yu-Feng Li
Lan-Zhe Guo
LRM
65
0
0
21 May 2025
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
74
0
0
20 May 2025
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
LRM
110
0
0
20 May 2025
CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation
CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation
Anna C. Doris
Md Ferdous Alam
Amin Heyrani Nobari
Faez Ahmed
80
0
0
20 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TSLRM
141
0
0
16 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
74
0
0
14 May 2025
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Yiran Chen
Hao Peng
Tong Zhang
Heng Ji
VLM
79
0
0
13 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELMLRM
125
4
0
04 May 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas Guibas
Minhyuk Sung
LRM
113
3
0
24 Apr 2025
RePOPE: Impact of Annotation Errors on the POPE Benchmark
RePOPE: Impact of Annotation Errors on the POPE Benchmark
Yannic Neuhaus
Matthias Hein
71
0
0
22 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
124
6
0
20 Apr 2025
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
Yize Zhang
Tianyi Liang
Xinyue Huang
Erfei Cui
Xu Guo
Pei Chu
Chenhui Li
Ru Zhang
Wenhai Wang
Gongshen Liu
348
0
0
15 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
221
132
1
14 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
Xuelong Li
Zilong Huang
Yuchen Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLMLRM
143
5
0
14 Apr 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
Jianfei Chen
Jingwei Xu
Tengjiao Wang
Zeang Sheng
Wentao Zhang
MLLM
153
1
0
14 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
Xuelong Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
74
5
0
14 Apr 2025
Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models
Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models
Teppei Suzuki
Keisuke Ozawa
VLM
180
0
0
14 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
153
5
0
10 Apr 2025
12345
Next