ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.11691
  4. Cited By
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
v1v2v3 (latest)

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

16 July 2024
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
Yuan Liu
Xiao-wen Dong
Yuhang Zang
Pan Zhang
Jiaqi Wang
Yubo Ma
Kai Chen
Yifan Zhang
Shiyin Lu
Tack Hwa Wong
Weiyun Wang
Peiheng Zhou
Xiaozhe Li
Chaoyou Fu
Junbo Cui
Xiaoyi Dong
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
    LM&MAVLM
ArXiv (abs)PDFHTML

Papers citing "VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models"

50 / 209 papers shown
Title
MM-IFEngine: Towards Multimodal Instruction Following
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
153
5
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Yang Liu
Qi Wang
Fuzheng Zhang
VLM
145
2
0
10 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
99
16
0
07 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
173
0
0
02 Apr 2025
One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image
One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image
Ezzeldin Shereen
Dan Ristea
Burak Hasircioglu
Shae McFadden
V. Mavroudis
Chris Hicks
183
0
0
02 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
139
2
0
01 Apr 2025
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
Kai Huang
Hao Zou
Bochen Wang
Ye Xi
Zhen Xie
Hao Wang
VLM
93
0
0
31 Mar 2025
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin
Yannic Neuhaus
Matthias Hein
VLM
130
3
0
30 Mar 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang
Yue Liao
Kang Rong
Fengyun Rao
Yibo Yang
Si Liu
113
0
0
26 Mar 2025
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
Mehdi Moshtaghi
Siavash H. Khajavi
Joni Pajarinen
VLM
148
0
0
25 Mar 2025
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Kexian Tang
Junyao Gao
Yanhong Zeng
Haodong Duan
Yanan Sun
Zhening Xing
Wenran Liu
Kaifeng Lyu
Kai-xiang Chen
ELMLRM
144
9
0
25 Mar 2025
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
Dawei Yan
Yangfu Li
Qing-Guo Chen
Weihua Luo
Peng Wang
Han Zhang
Chunhua Shen
VGenVLMLRM
85
1
0
24 Mar 2025
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Handong Li
Yiyuan Zhang
Longteng Guo
Xiangyu Yue
Jing Liu
VLM
166
1
0
24 Mar 2025
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
VLM
118
19
0
23 Mar 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang
Chenghui Lv
Xian Li
Shichao Dong
Huadong Li
Kelu Yao
Chao Li
Wenqi Shao
Ping Luo
139
1
0
19 Mar 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Xinyu Fang
Zheyu Chen
Kai Lan
Lixin Ma
Shengyuan Ding
...
Zicheng Zhang
Guofeng Zhang
Haodong Duan
Kai Chen
Dahua Lin
MLLM
111
3
0
18 Mar 2025
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Weixiong Lin
Chen Ju
Haicheng Wang
Shengchao Hu
Shuai Xiao
...
Yuheng Jiao
Mingshuai Yao
Jinsong Lan
Qingwen Liu
Ying Chen
84
0
0
18 Mar 2025
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
Cheng Yuan
Ziqiang Liu
Jiashu Lv
Jiawei Shao
Yufei Jiang
Jing Zhang
Xuelong Li
109
1
0
17 Mar 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Hai-Long Sun
Zhun Sun
Houwen Peng
Han-Jia Ye
LRM
141
6
0
17 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Yue Yang
Afshin Dehghan
Peter Grasch
126
5
0
17 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
98
0
0
16 Mar 2025
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
Jiacheng Ruan
Wenzhen Yuan
Xian Gao
Ye Guo
Daoxin Zhang
Zhe Xu
Yao Hu
Ting Liu
Yuzhuo Fu
LRMVLM
165
6
0
10 Mar 2025
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru
Dunant Cusipuma
David Ortega
Victor Flores-Benites
Arturo Deza
OOD
159
0
0
10 Mar 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
Kai Zhang
ELM
479
1
0
09 Mar 2025
Unified Reward Model for Multimodal Understanding and Generation
Yibin Wang
Yuhang Zang
Hao Li
Cheng Jin
Jiadong Wang
EGVM
165
11
0
07 Mar 2025
Are Large Vision Language Models Good Game Players?
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLMELMLRM
155
8
0
04 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
105
1
0
04 Mar 2025
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
Rui Hu
Delai Qiu
Shuyu Wei
J.N. Zhang
Yining Wang
Shengping Liu
Jitao Sang
AuLLMVLM
127
0
0
27 Feb 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Zhenyu Liu
Yunxin Li
Baotian Hu
Wenhan Luo
Yaowei Wang
Min Zhang
108
0
0
27 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
177
7
0
25 Feb 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
433
699
0
20 Feb 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanwei Li
Yu Qi
...
Shen Yan
Bo Zhang
Chaoyou Fu
Peng Gao
Hongsheng Li
MLLMLRM
121
38
0
13 Feb 2025
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Ahmed Masry
Juan A. Rodriguez
Tianyu Zhang
Suyuchen Wang
Chao Wang
...
I. Laradji
David Vazquez
Perouz Taslakian
Spandana Gella
Sai Rajeswar
96
0
0
03 Feb 2025
RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs
RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs
Hongliang Li
Jiaxin Zhang
Wenhui Liao
Dezhi Peng
Kai Ding
Lianwen Jin
OffRLMQ
144
0
0
31 Jan 2025
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin
Mingbao Lin
Luxi Lin
Rongrong Ji
108
24
0
28 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
252
134
0
10 Jan 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Xinzhe Ni
Zicheng Lin
...
Yiyao Yu
C. Shi
Ruihang Chu
Jin Zeng
Yujiu Yang
LRM
214
25
0
08 Jan 2025
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
Run Luo
Ting-En Lin
Jun Wang
Yuchuan Wu
Xiong Liu
...
Lei Zhang
Yushen Chen
Xiaobo Xia
Hamid Alinejad-Rokny
Fei Huang
VLMAuLLM
148
0
0
08 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming-Hsuan Yang
VLM
195
25
0
07 Jan 2025
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang
Yuchang Su
Yiming Liu
Xiaohan Wang
James Burgess
...
Josiah Aklilu
Alejandro Lozano
Anjiang Wei
Ludwig Schmidt
Serena Yeung-Levy
149
5
0
06 Jan 2025
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
Beichen Zhang
Yuhong Liu
Xiaoyi Dong
Yuhang Zang
Pan Zhang
Haodong Duan
Yuhang Cao
Dahua Lin
Jinqiao Wang
LRMReLM
149
6
0
06 Jan 2025
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video
  Understanding
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Guo Chen
Yicheng Liu
Yifei Huang
Yuping He
Baoqi Pei
Jilan Xu
Yali Wang
Tong Lu
Limin Wang
ELM
180
16
0
16 Dec 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image
  Pre-training
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang
Chen Ju
Weixiong Lin
Shuai Xiao
Mengting Chen
...
Mingshuai Yao
Jinsong Lan
Ying Chen
Qingwen Liu
Yanfeng Wang
VLMCLIP
121
4
0
30 Nov 2024
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Dianbo Sui
Qiang Liu
VLMCoGe
194
32
0
26 Nov 2024
MMGenBench: Fully Automatically Evaluating LMMs from the Text-to-Image Generation Perspective
MMGenBench: Fully Automatically Evaluating LMMs from the Text-to-Image Generation Perspective
Hailang Huang
Yong Wang
Zixuan Huang
Huaqiu Li
Tongwen Huang
Xiangxiang Chu
Richong Zhang
MLLMLM&MAEGVM
133
1
0
21 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
137
93
1
15 Nov 2024
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach
Yiran Huang
Stephan Alaniz
Trevor Darrell
Zeynep Akata
VLM
139
2
0
25 Oct 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst
Tim Nelson Tobiasch
Lukas Helff
Inga Ibs
Wolfgang Stammer
Devendra Singh Dhami
Constantin Rothkopf
Kristian Kersting
CoGeReLMVLMLRM
172
3
0
25 Oct 2024
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDaVLMMLLM
118
29
0
24 Oct 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large
  Vision-Language Models
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Haodong Duan
Zeang Sheng
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
105
12
0
23 Oct 2024
Previous
12345
Next