ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.18040
4
0

PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration

25 August 2025
Xin Wang
Zhiyao Cui
Hao Li
Ya Zeng
Chenxu Wang
Ruiqi Song
Y. Chen
Kun Shao
Qiaosheng Zhang
Jinzhuo Liu
Siyue Ren
Shuyue Hu
Zhen Wang
ArXiv (abs)PDFHTMLGithub (3★)
Main:7 Pages
6 Figures
Bibliography:2 Pages
4 Tables
Appendix:21 Pages
Abstract

Vision language model (VLM)-based mobile agents show great potential for assisting users in performing instruction-driven tasks. However, these agents typically struggle with personalized instructions -- those containing ambiguous, user-specific context -- a challenge that has been largely overlooked in previous research. In this paper, we define personalized instructions and introduce PerInstruct, a novel human-annotated dataset covering diverse personalized instructions across various mobile scenarios. Furthermore, given the limited personalization capabilities of existing mobile agents, we propose PerPilot, a plug-and-play framework powered by large language models (LLMs) that enables mobile agents to autonomously perceive, understand, and execute personalized user instructions. PerPilot identifies personalized elements and autonomously completes instructions via two complementary approaches: memory-based retrieval and reasoning-based exploration. Experimental results demonstrate that PerPilot effectively handles personalized tasks with minimal user intervention and progressively improves its performance with continued use, underscoring the importance of personalization-aware reasoning for next-generation mobile agents. The dataset and code are available at:this https URL

View on arXiv
Comments on this paper