ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.10387
129
0

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

12 June 2025
Yuquan Xie
Zaijing Li
Rui Shao
Gongwei Chen
Kaiwen Zhou
Yinchuan Li
Dongmei Jiang
Liqiang Nie
    LLMAG
ArXiv (abs)PDFHTML
Main:9 Pages
5 Figures
Bibliography:3 Pages
5 Tables
Appendix:8 Pages
Abstract

Recent efforts to leverage the Multi-modal Large Language Model (MLLM) as GUI agents have yielded promising outcomes. However, these agents still struggle with long-horizon tasks in online environments, primarily due to insufficient knowledge and the inherent gap between offline and online domains. In this paper, inspired by how humans generalize knowledge in open-ended environments, we propose a Hierarchical Multimodal Skills (HMS) module to tackle the issue of insufficient knowledge. It progressively abstracts trajectories into execution skills, core skills, and ultimately meta-skills, providing a hierarchical knowledge structure for long-horizon task planning. To bridge the domain gap, we propose the Skill-Augmented Monte Carlo Tree Search (SA-MCTS) algorithm, which efficiently leverages skills acquired in offline environments to reduce the action search space during online tree exploration. Building on HMS, we propose Mirage-1, a multimodal, cross-platform, plug-and-play GUI agent. To validate the performance of Mirage-1 in real-world long-horizon scenarios, we constructed a new benchmark, AndroidLH. Experimental results show that Mirage-1 outperforms previous agents by 32\%, 19\%, 15\%, and 79\% on AndroidWorld, MobileMiniWob++, Mind2Web-Live, and AndroidLH, respectively. Project page:this https URL

View on arXiv
@article{xie2025_2506.10387,
  title={ Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills },
  author={ Yuquan Xie and Zaijing Li and Rui Shao and Gongwei Chen and Kaiwen Zhou and Yinchuan Li and Dongmei Jiang and Liqiang Nie },
  journal={arXiv preprint arXiv:2506.10387},
  year={ 2025 }
}
Comments on this paper