ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.15894
36
0

Craft: Cross-modal Aligned Features Improve Robustness of Prompt Tuning

22 July 2024
Jingchen Sun
Rohan Sharma
Vishnu Suresh Lokhande
Changyou Chen
ArXivPDFHTML
Abstract

Prompt Tuning has emerged as a prominent research paradigm for adapting vision-language models to various downstream tasks. However, recent research indicates that prompt tuning methods often lead to overfitting due to limited training samples. In this paper, we propose a Cross-modal Aligned Feature Tuning (Craft) method to address this issue. Cross-modal alignment is conducted by first selecting anchors from the alternative domain and deriving relative representations of the embeddings for the selected anchors. Optimizing for a feature alignment loss over anchor-aligned text and image modalities creates a more unified text-image common space. Overfitting in prompt tuning also deteriorates model performance on out-of-distribution samples. To further improve the prompt model's robustness, we propose minimizing Maximum Mean Discrepancy (MMD) over the anchor-aligned feature spaces to mitigate domain shift. The experiment on four different prompt tuning structures consistently shows the improvement of our method, with increases of up to 6.1%6.1\%6.1% in the Base-to-Novel generalization task, 5.8%5.8\%5.8% in the group robustness task, and 2.7%2.7\%2.7% in the out-of-distribution tasks. The code will be available at https://github.com/Jingchensun/Craft

View on arXiv
Comments on this paper