ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.06429
20
1

Linguistic Query-Guided Mask Generation for Referring Image Segmentation

16 January 2023
Zhichao Wei
Xiaohao Chen
Mingqiang Chen
Siyu Zhu
    VLM
ArXivPDFHTML
Abstract

Referring image segmentation aims to segment the image region of interest according to the given language expression, which is a typical multi-modal task. Existing methods either adopt the pixel classification-based or the learnable query-based framework for mask generation, both of which are insufficient to deal with various text-image pairs with a fix number of parametric prototypes. In this work, we propose an end-to-end framework built on transformer to perform Linguistic query-Guided mask generation, dubbed LGFormer. It views the linguistic features as query to generate a specialized prototype for arbitrary input image-text pair, thus generating more consistent segmentation results. Moreover, we design several cross-modal interaction modules (\eg, vision-language bidirectional attention module, VLBA) in both encoder and decoder to achieve better cross-modal alignment.

View on arXiv
Comments on this paper