ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.13205
50
0
v1v2v3v4 (latest)

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

16 June 2025
Xuan Wang
Siyuan Liang
Zhe Liu
Yi Yu
Yuliang Lu
Xiaochun Cao
Ee-Chien Chang
X. Gao
    AAML
ArXiv (abs)PDFHTML
Main:10 Pages
5 Figures
Bibliography:2 Pages
Abstract

With the growing integration of vision-language models (VLMs), mobile agents are now widely used for tasks like UI automation and camera-based user assistance. These agents are often fine-tuned on limited user-generated datasets, leaving them vulnerable to covert threats during the training process. In this work we present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon VLMs. Our method manipulates only the visual inputs of a portion of the training samples - without altering their corresponding labels or instructions - thereby injecting malicious behaviors into the model. Once fine-tuned with this tampered data, the agent will exhibit attacker-controlled responses when a specific visual trigger is introduced at inference time. The core of our approach lies in aligning the gradients of poisoned samples with those of a chosen target instance, embedding backdoor-relevant features into the poisoned training data. To maintain stealth and enhance robustness, we develop three realistic visual triggers: static visual patches, dynamic motion cues, and subtle low-opacity overlays. We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use. Results show that our attack achieves high attack success rates (up to 94.67 percent) while maintaining high clean-task performance (FSR up to 95.85 percent). Additionally, ablation studies shed light on how various design choices affect the efficacy and concealment of the attack. Overall, this work is the first to expose critical security flaws in VLM-based mobile agents, highlighting their susceptibility to clean-label backdoor attacks and the urgent need for effective defense mechanisms in their training pipelines.

View on arXiv
@article{wang2025_2506.13205,
  title={ Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments },
  author={ Xuan Wang and Siyuan Liang and Zhe Liu and Yi Yu and Yuliang Lu and Xiaochun Cao and Ee-Chien Chang and Xitong Gao },
  journal={arXiv preprint arXiv:2506.13205},
  year={ 2025 }
}
Comments on this paper