ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.21389
  4. Cited By
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs

AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs

27 May 2025
Xuanwen Ding
Chengjun Pan
Zejun Li
Jiwen Zhang
Siyuan Wang
Zhongyu Wei
ArXivPDFHTML

Papers citing "AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs"

50 / 53 papers shown
Title
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
144
430
0
20 Feb 2025
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Xiaokang Chen
Zhiyu Wu
Xingchao Liu
Zizheng Pan
Wen Liu
Zhenda Xie
X. Yu
Chong Ruan
AI4TS
68
126
0
29 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
218
1,503
0
22 Jan 2025
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
  Language Models on Mobile Devices
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Cheng Chen
Hui Tan
Boheng Chen
...
Aojun Zhou
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
33
9
0
16 Nov 2024
Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance
  Judgements
Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements
Silvia Terragni
Hoang Cuong
Joachim Daiber
Pallavi Gudipati
Pablo N. Mendes
27
1
0
25 Oct 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
126
750
0
25 Oct 2024
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDa
VLM
MLLM
59
25
0
24 Oct 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
  and Generation
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu
Xiaokang Chen
Z. F. Wu
Yiyang Ma
Xingchao Liu
...
Wen Liu
Zhenda Xie
Xingkai Yu
Chong Ruan
Ping Luo
AI4TS
95
89
0
17 Oct 2024
H2OVL-Mississippi Vision Language Models Technical Report
H2OVL-Mississippi Vision Language Models Technical Report
Shaikat Galib
Shanshan Wang
Guanshuo Xu
Pascal Pfeiffer
Ryan Chesler
Mark Landry
Sri Satish Ambati
MLLM
VLM
28
4
0
17 Oct 2024
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM
  Performance and Generalization
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization
Mucong Ding
Chenghao Deng
Jocelyn Choo
Zichu Wu
Aakriti Agrawal
...
Dinesh Manocha
Tom Goldstein
John Langford
Anima Anandkumar
Furong Huang
69
7
0
27 Sep 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
  Multimodal Models
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Matt Deitke
Christopher Clark
Sangho Lee
Rohun Tripathi
Yue Yang
...
Noah A. Smith
Hannaneh Hajishirzi
Ross Girshick
Ali Farhadi
Aniruddha Kembhavi
OSLM
VLM
61
58
0
25 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLM
VLM
LRM
60
66
0
17 Sep 2024
POINTS: Improving Your Vision-language Model with Affordable Strategies
POINTS: Improving Your Vision-language Model with Affordable Strategies
Yuan Liu
Zhongyin Zhao
Ziyuan Zhuang
Le Tian
Xiao Zhou
Jie Zhou
VLM
57
7
0
07 Sep 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
90
61
0
28 Aug 2024
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Khang T. Doan
Bao G. Huynh
D. T. Hoang
Thuc D. Pham
Nhat H. Pham
Quan T.M. Nguyen
Bang Q. Vo
Suong N. Hoang
MLLM
46
5
0
22 Aug 2024
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
...
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
VLM
MLLM
59
403
0
03 Aug 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
97
142
0
16 Jul 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
Liwen Wang
Rong Jin
61
41
0
12 Jun 2024
What matters when building vision-language models?
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
74
166
0
03 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max Ku
Qian Liu
Wenhu Chen
VLM
MLLM
43
111
0
02 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
83
578
0
25 Apr 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large
  Vision-Language Models Towards Multitask AGI
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying
Fanqing Meng
Jin Wang
Zhiqiang Li
Han Lin
...
Yali Wang
Yuning Qiao
Ping Luo
Kaipeng Zhang
Wenqi Shao
38
88
0
24 Apr 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRM
ALM
90
1,136
0
22 Apr 2024
BLINK: Multimodal Large Language Models Can See but Not Perceive
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu
Yushi Hu
Bangzheng Li
Yu Feng
Haoyu Wang
Xudong Lin
Dan Roth
Noah A. Smith
Wei-Chiu Ma
Ranjay Krishna
VLM
LRM
MLLM
50
133
0
18 Apr 2024
tinyBenchmarks: evaluating LLMs with fewer examples
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo
Lucas Weber
Leshem Choshen
Yuekai Sun
Gongjun Xu
Mikhail Yurochkin
ELM
60
85
0
22 Feb 2024
Efficient Multimodal Learning from Data-centric Perspective
Efficient Multimodal Learning from Data-centric Perspective
Muyang He
Yexin Liu
Boya Wu
Jianhao Yuan
Yueze Wang
Tiejun Huang
Bo Zhao
MLLM
50
86
0
18 Feb 2024
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM
  Evaluation
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
Siyuan Wang
Zhuohan Long
Zhihao Fan
Zhongyu Wei
Xuanjing Huang
LLMAG
58
32
0
18 Feb 2024
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question
  Understanding and Reasoning
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Zheqi He
Xinya Wu
Pengfei Zhou
Richeng Xuan
Guang Liu
Xi Yang
Qiannan Zhu
Hua Huang
ELM
LRM
39
15
0
25 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
211
1,036
0
21 Dec 2023
Difficulty-Focused Contrastive Learning for Knowledge Tracing with a
  Large Language Model-Based Difficulty Prediction
Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction
Unggi Lee
Sungjun Yoon
Joon Seo Yun
Kyoungsoo Park
YoungHoon Jung
Damji Stratton
Hyeoncheol Kim
47
9
0
19 Dec 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
148
833
0
27 Nov 2023
Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse
  Biomedical Tasks
Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks
Ling Luo
Jinzhong Ning
Yingwen Zhao
Zhijun Wang
Zeyuan Ding
...
Yuqi Liu
Zhihao Yang
Jian Wang
Shengdi Yin
Hongfei Lin
LM&MA
115
53
0
20 Nov 2023
Monkey: Image Resolution and Text Label Are Important Things for Large
  Multi-modal Models
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li
Biao Yang
Qiang Liu
Zhiyin Ma
Shuo Zhang
Jingxu Yang
Yabo Sun
Yuliang Liu
Xiang Bai
MLLM
63
259
0
11 Nov 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
89
2,593
0
05 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified
  Re-Formulation of Task-Oriented Benchmarks
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
58
15
0
04 Oct 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
72
541
0
03 Oct 2023
Anchor Points: Benchmarking Models with Much Fewer Examples
Anchor Points: Benchmarking Models with Much Fewer Examples
Rajan Vivek
Kawin Ethayarajh
Diyi Yang
Douwe Kiela
ALM
68
25
0
14 Sep 2023
Efficient Benchmarking of Language Models
Efficient Benchmarking of Language Models
Yotam Perlitz
Elron Bandel
Ariel Gera
Ofir Arviv
L. Ein-Dor
Eyal Shnarch
Noam Slonim
Michal Shmueli-Scheuer
Leshem Choshen
ALM
39
25
0
22 Aug 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLM
ELM
68
534
0
30 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
44
992
0
12 Jul 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
78
735
0
26 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language
  Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
62
806
0
23 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
51
167
0
15 Jun 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
345
4,607
0
17 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
647
13,788
0
15 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
815
12,840
0
27 Feb 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
228
1,188
0
20 Sep 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
582
9,009
0
28 Jan 2022
Comparing Test Sets with Item Response Theory
Comparing Test Sets with Item Response Theory
Clara Vania
Phu Mon Htut
William Huang
Dhara Mungra
Richard Yuanzhe Pang
Jason Phang
Haokun Liu
Kyunghyun Cho
Sam Bowman
41
41
0
01 Jun 2021
Guiding the Growth: Difficulty-Controllable Question Generation through
  Step-by-Step Rewriting
Guiding the Growth: Difficulty-Controllable Question Generation through Step-by-Step Rewriting
Yi Cheng
Siyao Li
Bang Liu
Ruihui Zhao
Sujian Li
Chenghua Lin
Yefeng Zheng
75
37
0
25 May 2021
12
Next