ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19662
29
0
v1v2 (latest)

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

26 May 2025
Atsunori Moteki
S. Masui
Fan Yang
Yueqi Song
Yonatan Bisk
Graham Neubig
Ikuo Kusajima
Yasuto Watanabe
Hiroyuki Ishida
Jun Takahashi
Shan Jiang
ArXiv (abs)PDFHTML
Main:6 Pages
2 Figures
Bibliography:1 Pages
6 Tables
Abstract

This paper proposes FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI, they are required to monitor and report safety and health incidents, as well as manufacturing-related incidents, that may occur in real-world work environments. Existing agentic AI benchmarks have been limited to evaluating web tasks and are insufficient for evaluating agents in real-world work environments, where complexity increases significantly. In this paper, we define a new action space that agentic AI should possess for real world work environment benchmarks and improve the evaluation function from previous methods to assess the performance of agentic AI in diverse real-world tasks. The dataset consists of videos captured on-site and documents actually used in factories and warehouses, and tasks were created based on interviews with on-site workers and managers. Evaluation results confirmed that performance evaluation considering the characteristics of Multimodal LLM (MLLM) such as GPT-4o is feasible. Additionally, the effectiveness and limitations of the proposed new evaluation method were identified. The complete dataset (HuggingFace) and evaluation program (GitHub) can be downloaded from the following website:this https URL.

View on arXiv
@article{moteki2025_2505.19662,
  title={ FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks },
  author={ Atsunori Moteki and Shoichi Masui and Fan Yang and Yueqi Song and Yonatan Bisk and Graham Neubig and Ikuo Kusajima and Yasuto Watanabe and Hiroyuki Ishida and Jun Takahashi and Shan Jiang },
  journal={arXiv preprint arXiv:2505.19662},
  year={ 2025 }
}
Comments on this paper