ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12707
7
0

PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

19 May 2025
Yingchen He
Christian D. Weilbach
Martyna E. Wojciechowska
Yuxuan Zhang
Frank Wood
    LM&Ro
    VGen
ArXivPDFHTML
Abstract

Advances in deep generative modelling have made it increasingly plausible to train human-level embodied agents. Yet progress has been limited by the absence of large-scale, real-time, multi-modal, and socially interactive datasets that reflect the sensory-motor complexity of natural environments. To address this, we present PLAICraft, a novel data collection platform and dataset capturing multiplayer Minecraft interactions across five time-aligned modalities: video, game output audio, microphone input audio, mouse, and keyboard actions. Each modality is logged with millisecond time precision, enabling the study of synchronous, embodied behaviour in a rich, open-ended world. The dataset comprises over 10,000 hours of gameplay from more than 10,000 global participants.\footnote{We have done a privacy review for the public release of an initial 200-hour subset of the dataset, with plans to release most of the dataset over time.} Alongside the dataset, we provide an evaluation suite for benchmarking model capabilities in object recognition, spatial awareness, language grounding, and long-term memory. PLAICraft opens a path toward training and evaluating agents that act fluently and purposefully in real time, paving the way for truly embodied artificial intelligence.

View on arXiv
@article{he2025_2505.12707,
  title={ PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI },
  author={ Yingchen He and Christian D. Weilbach and Martyna E. Wojciechowska and Yuxuan Zhang and Frank Wood },
  journal={arXiv preprint arXiv:2505.12707},
  year={ 2025 }
}
Comments on this paper