ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.17650
58
0

Visual Variational Autoencoder Prompt Tuning

22 March 2025
Xi Xiao
Yunbei Zhang
Y. Li
Xingjian Li
Tianyang Wang
Jihun Hamm
X. Wang
Min Xu
    VPVLM
    VLM
ArXivPDFHTML
Abstract

Parameter-efficient fine-tuning (PEFT) has emerged as a crucial approach for adapting large vision transformers to downstream tasks without the prohibitive computational costs of full fine-tuning. While existing visual prompt tuning (VPT) methods have made significant strides, they predominantly rely on static, domain-specific prompts that fail to capture the rich visual diversity within individual instances. This paper introduces V2^22APT (Visual Variational Autoencoder Prompt Tuning), a novel framework that generates dynamic, input-dependent prompts using a variational autoencoder architecture. By learning a latent representation of image-specific features and decoding them into customized prompts, V2^22APT adapts to the unique visual characteristics of each input. Extensive experiments on FGVC, HTA, and VTAB-1k benchmarks demonstrate that our approach consistently outperforms state-of-the-art PEFT methods. Notably, V2^22APT achieves +3.2\% improvement over VPT-Deep on HTA, with an average performance gain of +2.0\% across all three datasets.

View on arXiv
@article{xiao2025_2503.17650,
  title={ Visual Variational Autoencoder Prompt Tuning },
  author={ Xi Xiao and Yunbei Zhang and Yanshuh Li and Xingjian Li and Tianyang Wang and Jihun Hamm and Xiao Wang and Min Xu },
  journal={arXiv preprint arXiv:2503.17650},
  year={ 2025 }
}
Comments on this paper