Open Set Domain Adaptation with Vision-language models via Gradient-aware Separation

16 May 2025

Abstract

Open-Set Domain Adaptation (OSDA) confronts the dual challenge of aligning known-class distributions across domains while identifying target-domain-specific unknown categories. Current approaches often fail to leverage semantic relationships between modalities and struggle with error accumulation in unknown sample detection. We propose to harness Contrastive Language-Image Pretraining (CLIP) to address these limitations through two key innovations: 1) Prompt-driven cross-domain alignment: Learnable textual prompts conditioned on domain discrepancy metrics dynamically adapt CLIP's text encoder, enabling semantic consistency between source and target domains without explicit unknown-class supervision. 2) Gradient-aware open-set separation: A gradient analysis module quantifies domain shift by comparing the L2-norm of gradients from the learned prompts, where known/unknown samples exhibit statistically distinct gradient behaviors. Evaluations on Office-Home show that our method consistently outperforms CLIP baseline and standard baseline. Ablation studies confirm the gradient norm's critical role.

View on arXiv

@article{chen2025_2505.13507,
  title={ Open Set Domain Adaptation with Vision-language models via Gradient-aware Separation },
  author={ Haoyang Chen },
  journal={arXiv preprint arXiv:2505.13507},
  year={ 2025 }
}

Comments on this paper