Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.11691
Cited By
v1
v2
v3 (latest)
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
16 July 2024
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
Yuan Liu
Xiao-wen Dong
Yuhang Zang
Pan Zhang
Jiaqi Wang
Yubo Ma
Kai Chen
Yifan Zhang
Shiyin Lu
Tack Hwa Wong
Weiyun Wang
Peiheng Zhou
Xiaozhe Li
Chaoyou Fu
Junbo Cui
Xiaoyi Dong
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models"
50 / 209 papers shown
Title
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
Yizhen Zhang
Yang Ding
Shuoshuo Zhang
Xinchen Zhang
Haoling Li
...
Jie Wu
Lei Ji
Yelong Shen
Y. Yang
Yeyun Gong
OffRL
VLM
LRM
24
0
0
17 Jun 2025
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Haibo Qiu
X. Lan
Fanfan Liu
Xiaohu Sun
Delian Ruan
Peng Shi
Lin Ma
ReLM
OffRL
LRM
37
0
0
16 Jun 2025
Evaluating Cell Type Inference in Vision Language Models Under Varying Visual Context
Samarth Singhal
Sandeep Singhal
VLM
19
0
0
15 Jun 2025
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
René Peinl
Vincent Tischler
CoGe
VLM
37
0
0
13 Jun 2025
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang
Zhenhua Liu
Jiaheng Wei
Xuanwu Yin
Dong Li
E. Barsoum
LRM
80
0
0
11 Jun 2025
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Zheqi He
Yesheng Liu
Jing-shu Zheng
Xuejing Li
Richeng Xuan
Jin-Ge Yao
Xi Yang
Xi Yang
MLLM
VLM
44
0
0
10 Jun 2025
Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models
Chengyue Huang
Yuchen Zhu
Sichen Zhu
Jingyun Xiao
Moises Andrade
Shivang Chopra
Z. Kira
ReLM
VLM
LRM
15
0
0
09 Jun 2025
Vision Transformers Don't Need Trained Registers
Nick Jiang
Amil Dravid
Alexei A. Efros
Yossi Gandelsman
35
0
0
09 Jun 2025
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
Yuhan Hao
Zhengning Li
Lei Sun
Weilong Wang
Naixin Yi
...
Caihong Qin
Mofan Zhou
Yifei Zhan
Peng Jia
Xianpeng Lang
VLM
42
0
0
06 Jun 2025
HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios
Daming Wang
Yuhao Song
Zijian He
Kangliang Chen
Xing Pan
Lu Deng
Weihao Gu
VLM
LRM
88
0
0
06 Jun 2025
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Yan Shu
Hangui Lin
Yexin Liu
Yan Zhang
Gangyan Zeng
Yan Li
Yu Zhou
Ser-Nam Lim
Harry Yang
N. Sebe
MLLM
VLM
49
0
0
05 Jun 2025
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Xin Jin
Zhenguo Li
James T. Kwok
Yu Zhang
LRM
98
0
0
05 Jun 2025
Can Vision Language Models Infer Human Gaze Direction? A Controlled Study
Zory Zhang
Pinyuan Feng
Bingyang Wang
Tianwei Zhao
Suyang Yu
Qingying Gao
Hokin Deng
Ziqiao Ma
Yijiang Li
Dezhi Luo
20
0
0
04 Jun 2025
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
Yang Yao
Lingyu Li
Jiaxin Song
Chiyu Chen
Zhenqi He
...
Xin Wang
Tianle Gu
Jie Li
Yan Teng
Yingchun Wang
LRM
17
0
0
03 Jun 2025
Improve MLLM Benchmark Efficiency through Interview
Farong Wen
Yijin Guo
Junying Wang
Jiaohao Xiao
Yingjie Zhou
Chunyi Li
Zicheng Zhang
Guangtao Zhai
MLLM
28
0
0
01 Jun 2025
GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
Yufei Zhan
Ziheng Wu
Yousong Zhu
Rongkun Xue
Ruipu Luo
...
Zhentao He
Zheming Yang
Ming Tang
Minghui Qiu
Jinqiao Wang
MLLM
ReLM
LRM
50
0
0
01 Jun 2025
Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective
Lei Lei
Jie Gu
Xiaokang Ma
Chu Tang
Jingmin Chen
Tong Xu
43
1
0
01 Jun 2025
Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation
Zeyu Liu
Zhitian Hou
Yining Di
Kejing Yang
Zhijie Sang
...
Siyuan Liu
Jialu Wang
Chunming Li
Ming Li
Hongxia Yang
LM&MA
LRM
15
0
0
29 May 2025
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Y. Liu
Kun Ouyang
Haoning Wu
Yi Liu
Lin Sui
Xinhao Li
Y. Zhong
Y. Charles
Xinyu Zhou
Xu Sun
VLM
LRM
90
0
0
29 May 2025
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Sihan Yang
Runsen Xu
Yiman Xie
Sizhe Yang
Mo Li
...
Haodong Duan
Xiangyu Yue
Dahua Lin
Tai Wang
Jiangmiao Pang
VLM
LRM
53
1
0
29 May 2025
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
Chan-wei Hu
Yueqi Wang
Shuo Xing
Chia-Ju Chen
Zhengzhong Tu
3DV
17
1
0
29 May 2025
NegVQA: Can Vision Language Models Understand Negation?
Yuhui Zhang
Yuchang Su
Yiming Liu
Serena Yeung-Levy
MLLM
CoGe
48
0
0
28 May 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLM
LRM
VLM
104
0
0
28 May 2025
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
Xuanwen Ding
Chengjun Pan
Zejun Li
Jiwen Zhang
Siyuan Wang
Zhongyu Wei
59
0
0
27 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
185
1
0
25 May 2025
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Jiwan Chung
Junhyeok Kim
Siyeol Kim
Jaeyoung Lee
Min Soo Kim
Youngjae Yu
LRM
95
0
0
24 May 2025
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
Anjie Le
Henan Liu
Yue Wang
Zhenyu Liu
Rongkun Zhu
...
Alison Noble
Jacques Souquet
Xiaoqing Guo
Manxi Lin
Hongcheng Guo
LM&MA
ELM
VLM
68
0
0
23 May 2025
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
180
0
0
23 May 2025
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
Meng-Hao Guo
Xuanyu Chu
Qianrui Yang
Zhe-Han Mo
Yiqing Shen
...
Kiyohiro Nakayama
Zhengyang Geng
Houwen Peng
Han Hu
Shi-Min Hu
LRM
197
0
0
22 May 2025
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan
Kaituo Feng
Haoming Lyu
Dongzhan Zhou
Xiangyu Yue
ReLM
LRM
131
0
0
22 May 2025
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
Huanjin Yao
Qixiang Yin
Jingyi Zhang
Min Yang
Yibo Wang
...
Fei Su
Li Shen
Minghui Qiu
Dacheng Tao
Jiaxing Huang
LRM
72
0
0
22 May 2025
OViP: Online Vision-Language Preference Learning
Shujun Liu
Siyuan Wang
Zejun Li
Jianxiang Wang
Cheng Zeng
Zhongyu Wei
MLLM
VLM
76
0
0
21 May 2025
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Weiming Wu
Zi-kang Wang
Jin Ye
Zhi Zhou
Yu-Feng Li
Lan-Zhe Guo
LRM
65
0
0
21 May 2025
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
74
0
0
20 May 2025
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
LRM
110
0
0
20 May 2025
CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation
Anna C. Doris
Md Ferdous Alam
Amin Heyrani Nobari
Faez Ahmed
80
0
0
20 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TS
LRM
141
0
0
16 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
74
0
0
14 May 2025
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Yiran Chen
Hao Peng
Tong Zhang
Heng Ji
VLM
79
0
0
13 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
125
4
0
04 May 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas Guibas
Minhyuk Sung
LRM
113
3
0
24 Apr 2025
RePOPE: Impact of Annotation Errors on the POPE Benchmark
Yannic Neuhaus
Matthias Hein
71
0
0
22 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
124
6
0
20 Apr 2025
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
Yize Zhang
Tianyi Liang
Xinyue Huang
Erfei Cui
Xu Guo
Pei Chu
Chenhui Li
Ru Zhang
Wenhai Wang
Gongshen Liu
348
0
0
15 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
221
132
1
14 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
Xuelong Li
Zilong Huang
Yuchen Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLM
LRM
143
5
0
14 Apr 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
Jianfei Chen
Jingwei Xu
Tengjiao Wang
Zeang Sheng
Wentao Zhang
MLLM
153
1
0
14 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
Xuelong Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
74
5
0
14 Apr 2025
Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models
Teppei Suzuki
Keisuke Ozawa
VLM
180
0
0
14 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
153
5
0
10 Apr 2025
1
2
3
4
5
Next