ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.08933
17
0

What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities

10 June 2025
Wendong Bu
Yang Wu
Qifan Yu
Minghe Gao
Bingchen Miao
Zhenkui Zhang
Kaihang Pan
Yunfei Li
Mengze Li
Wei Ji
Juncheng Billy Li
Siliang Tang
Yueting Zhuang
    ELM
ArXiv (abs)PDFHTML
Abstract

As multimodal large language models (MLLMs) advance, MLLM-based virtual agents have demonstrated remarkable performance. However, existing benchmarks face significant limitations, including uncontrollable task complexity, extensive manual annotation with limited scenarios, and a lack of multidimensional evaluation. In response to these challenges, we introduce OmniBench, a self-generating, cross-platform, graph-based benchmark with an automated pipeline for synthesizing tasks of controllable complexity through subtask composition. To evaluate the diverse capabilities of virtual agents on the graph, we further present OmniEval, a multidimensional evaluation framework that includes subtask-level evaluation, graph-based metrics, and comprehensive tests across 10 capabilities. Our synthesized dataset contains 36k graph-structured tasks across 20 scenarios, achieving a 91\% human acceptance rate. Training on our graph-structured data shows that it can more efficiently guide agents compared to manually annotated data. We conduct multidimensional evaluations for various open-source and closed-source models, revealing their performance across various capabilities and paving the way for future advancements. Our project is available atthis https URL.

View on arXiv
@article{bu2025_2506.08933,
  title={ What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities },
  author={ Wendong Bu and Yang Wu and Qifan Yu and Minghe Gao and Bingchen Miao and Zhenkui Zhang and Kaihang Pan and Yunfei Li and Mengze Li and Wei Ji and Juncheng Li and Siliang Tang and Yueting Zhuang },
  journal={arXiv preprint arXiv:2506.08933},
  year={ 2025 }
}
Main:9 Pages
14 Figures
Bibliography:2 Pages
12 Tables
Appendix:14 Pages
Comments on this paper