ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.21844
14
0

Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation

28 May 2025
Mehrdad Noori
David Osowiechi
G. A. V. Hakim
Ali Bahri
Moslem Yazdanpanah
Sahar Dastani
Farzad Beizaee
Ismail Ben Ayed
Christian Desrosiers
    VLM
    TTA
ArXivPDFHTML
Abstract

Recently, test-time adaptation has attracted wide interest in the context of vision-language models for image classification. However, to the best of our knowledge, the problem is completely overlooked in dense prediction tasks such as Open-Vocabulary Semantic Segmentation (OVSS). In response, we propose a novel TTA method tailored to adapting VLMs for segmentation during test time. Unlike TTA methods for image classification, our Multi-Level and Multi-Prompt (MLMP) entropy minimization integrates features from intermediate vision-encoder layers and is performed with different text-prompt templates at both the global CLS token and local pixel-wise levels. Our approach could be used as plug-and-play for any segmentation network, does not require additional training data or labels, and remains effective even with a single test sample. Furthermore, we introduce a comprehensive OVSS TTA benchmark suite, which integrates a rigorous evaluation protocol, seven segmentation datasets, and 15 common corruptions, with a total of 82 distinct test scenarios, establishing a standardized and comprehensive testbed for future TTA research in open-vocabulary segmentation. Our experiments on this suite demonstrate that our segmentation-tailored method consistently delivers significant gains over direct adoption of TTA classification baselines.

View on arXiv
@article{noori2025_2505.21844,
  title={ Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation },
  author={ Mehrdad Noori and David Osowiechi and Gustavo Adolfo Vargas Hakim and Ali Bahri and Moslem Yazdanpanah and Sahar Dastani and Farzad Beizaee and Ismail Ben Ayed and Christian Desrosiers },
  journal={arXiv preprint arXiv:2505.21844},
  year={ 2025 }
}
Comments on this paper