ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.09583
35
2

AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding

13 April 2025
Fei Lin
Yonglin Tian
Tengchao Zhang
Jun Huang
Sangtian Guan
Fei-Yue Wang
ArXivPDFHTML
Abstract

Unmanned Aerial Vehicles (UAVs) are increasingly important in dynamic environments such as logistics transportation and disaster response. However, current tasks often rely on human operators to monitor aerial videos and make operational decisions. This mode of human-machine collaboration suffers from significant limitations in efficiency and adaptability. In this paper, we present AirVista-II -- an end-to-end agentic system for embodied UAVs, designed to enable general-purpose semantic understanding and reasoning in dynamic scenes. The system integrates agent-based task identification and scheduling, multimodal perception mechanisms, and differentiated keyframe extraction strategies tailored for various temporal scenarios, enabling the efficient capture of critical scene information. Experimental results demonstrate that the proposed system achieves high-quality semantic understanding across diverse UAV-based dynamic scenarios under a zero-shot setting.

View on arXiv
@article{lin2025_2504.09583,
  title={ AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding },
  author={ Fei Lin and Yonglin Tian and Tengchao Zhang and Jun Huang and Sangtian Guan and Fei-Yue Wang },
  journal={arXiv preprint arXiv:2504.09583},
  year={ 2025 }
}
Comments on this paper