ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07622
34
15

Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval

13 November 2023
Junyang Chen
Hanjiang Lai
    VLM
ArXivPDFHTML
Abstract

Zero-shot composed image retrieval (ZS-CIR), which takes a textual modification and a reference image as a query to retrieve a target image without triplet labeling, has gained more and more attention in data mining. Current ZS-CIR research mainly relies on the generalization ability of pre-trained vision-language models, e.g., CLIP. However, the pre-trained vision-language models and CIR tasks have substantial discrepancies, where the vision-language models focus on learning the similarities but CIR aims to learn the modifications of the image guided by text. In this paper, we introduce a novel unlabeled and pre-trained masked tuning approach, which reduces the gap between the pre-trained vision-language model and the downstream CIR task. First, to reduce the gap, we reformulate the contrastive learning of the vision-language model as the CIR task, where we randomly mask input image patches to generate ⟨\langle⟨masked image, text, image⟩\rangle⟩ triplet from an image-text pair. Then, we propose a simple but novel pre-trained masked tuning method, which uses the text and the masked image to learn the modifications of the original image. With such a simple design, the proposed masked tuning can learn to better capture fine-grained text-guided modifications. Extensive experimental results demonstrate the significant superiority of our approach over the baseline models on four ZS-CIR datasets, including FashionIQ, CIRR, CIRCO, and GeneCIS. Our codes are available atthis https URL

View on arXiv
@article{chen2025_2311.07622,
  title={ Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval },
  author={ Junyang Chen and Hanjiang Lai },
  journal={arXiv preprint arXiv:2311.07622},
  year={ 2025 }
}
Comments on this paper