ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19139
47
0

The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework

25 May 2025
Feiran Liu
Y. Zhang
Xinyi Huang
Yinan Peng
Xinfeng Li
Lixu Wang
Yutong Shen
Ranjie Duan
Simeng Qin
Xiaojun Jia
Qingsong Wen
Wei Dong
    PILM
ArXiv (abs)PDFHTML
Main:8 Pages
4 Figures
Bibliography:2 Pages
6 Tables
Abstract

Our research reveals a new privacy risk associated with the vision-language model (VLM) agentic framework: the ability to infer sensitive attributes (e.g., age and health information) and even abstract ones (e.g., personality and social traits) from a set of personal images, which we term "image private attribute profiling." This threat is particularly severe given that modern apps can easily access users' photo albums, and inference from image sets enables models to exploit inter-image relations for more sophisticated profiling. However, two main challenges hinder our understanding of how well VLMs can profile an individual from a few personal photos: (1) the lack of benchmark datasets with multi-image annotations for private attributes, and (2) the limited ability of current multimodal large language models (MLLMs) to infer abstract attributes from large image collections. In this work, we construct PAPI, the largest dataset for studying private attribute profiling in personal images, comprising 2,510 images from 251 individuals with 3,012 annotated privacy attributes. We also propose HolmesEye, a hybrid agentic framework that combines VLMs and LLMs to enhance privacy inference. HolmesEye uses VLMs to extract both intra-image and inter-image information and LLMs to guide the inference process as well as consolidate the results through forensic analysis, overcoming existing limitations in long-context visual reasoning. Experiments reveal that HolmesEye achieves a 10.8% improvement in average accuracy over state-of-the-art baselines and surpasses human-level performance by 15.0% in predicting abstract attributes. This work highlights the urgency of addressing privacy risks in image-based profiling and offers both a new dataset and an advanced framework to guide future research in this area.

View on arXiv
@article{liu2025_2505.19139,
  title={ The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework },
  author={ Feiran Liu and Yuzhe Zhang and Xinyi Huang and Yinan Peng and Xinfeng Li and Lixu Wang and Yutong Shen and Ranjie Duan and Simeng Qin and Xiaojun Jia and Qingsong Wen and Wei Dong },
  journal={arXiv preprint arXiv:2505.19139},
  year={ 2025 }
}
Comments on this paper