ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.15871
  4. Cited By
VLT: Vision-Language Transformer and Query Generation for Referring
  Segmentation

VLT: Vision-Language Transformer and Query Generation for Referring Segmentation

28 October 2022
Henghui Ding
Chang Liu
Suchen Wang
Xudong Jiang
ArXivPDFHTML

Papers citing "VLT: Vision-Language Transformer and Query Generation for Referring Segmentation"

50 / 84 papers shown
Title
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Jiachen Li
Qing Xie
Xiaohan Yu
Hongyun Wang
Jinyu Xu
Yongjian Liu
ObjD
78
0
0
20 Apr 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
65
0
0
15 Apr 2025
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
X. Li
Tao Zhang
Lu Qi
Ming Yang
30
0
0
15 Apr 2025
STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation
STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation
Dandan Shan
Zihan Li
Yunxiang Li
Qingde Li
Jie Tian
Qingqi Hong
MedIm
33
0
0
02 Apr 2025
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities
Jing Liu
Wenxuan Wang
Yisi Zhang
Yepeng Tang
Xingjian He
Longteng Guo
Tongtian Yue
Xinlong Wang
ObjD
51
0
0
02 Apr 2025
AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation
AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation
Rui Li
Xiaowei Zhao
54
0
0
23 Feb 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
Yaxian Wang
Henghui Ding
Shuting He
Xudong Jiang
Bifan Wei
Jun Liu
ObjD
42
1
0
03 Jan 2025
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal
  Large Language Models
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
Cong Wei
Yujie Zhong
Haoxian Tan
Yingsen Zeng
Y. Liu
Zheng Zhao
Yujiu Yang
MLLM
VLM
VOS
98
2
0
18 Dec 2024
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for
  Semi-Supervised Medical Image Segmentation
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation
Qingtao Pan
Wenhao Qiao
Jingjiao Lou
Bing Ji
Shuo Li
VLM
80
0
0
17 Dec 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
106
2
0
26 Nov 2024
Referring Human Pose and Mask Estimation in the Wild
Referring Human Pose and Mask Estimation in the Wild
Bo Miao
Mingtao Feng
Zijie Wu
Mohammed Bennamoun
Yongsheng Gao
Ajmal Saeed Mian
26
0
0
27 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
24
5
0
10 Oct 2024
LSVOS Challenge Report: Large-scale Complex and Long Video Object
  Segmentation
LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation
Henghui Ding
Lingyi Hong
Chang Liu
Ning Xu
L. Yang
...
Bin Cao
Yisi Zhang
Hanyi Wang
Xingjian He
Jing Liu
VOS
34
2
0
09 Sep 2024
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
Runwei Guan
Jianan Liu
Liye Jia
Haocheng Zhao
Shanliang Yao
Xiaohui Zhu
Ka Lok Man
Eng Gee Lim
Jeremy S. Smith
Yutao Yue
49
5
0
30 Aug 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Pratik K. Mishra
Irene Ballester
Andrea Iaboni
B. Ye
Kristine Newman
Alex Mihailidis
Shehroz S. Khan
42
0
0
28 Aug 2024
3D-GRES: Generalized 3D Referring Expression Segmentation
3D-GRES: Generalized 3D Referring Expression Segmentation
Changli Wu
Yihang Liu
Jiayi Ji
Yiwei Ma
Haowei Wang
Gen Luo
Henghui Ding
Xiaoshuai Sun
Rongrong Ji
34
6
0
30 Jul 2024
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
Shuting He
Henghui Ding
52
10
0
25 Jul 2024
SegPoint: Segment Any Point Cloud via Large Language Model
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He
Henghui Ding
Xudong Jiang
Bihan Wen
3DV
MLLM
3DPC
42
17
0
18 Jul 2024
ActionVOS: Actions as Prompts for Video Object Segmentation
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang
Ruicong Liu
Yifei Huang
Ryosuke Furuta
Yoichi Sato
VOS
33
2
0
10 Jul 2024
Large Language Model-Augmented Auto-Delineation of Treatment Target
  Volume in Radiation Therapy
Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy
Praveenbalaji Rajendran
Yong Yang
Thomas R. Niedermayr
Michael Gensheimer
Beth Beadle
Quynh Le
Lei Xing
Xianjin Dai
35
2
0
10 Jul 2024
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for
  Adaptive ViT Inference
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
Ye Li
Chen Tang
Yuan Meng
Jiajun Fan
Zenghao Chai
Xinzhu Ma
Zhi Wang
Wenwu Zhu
31
1
0
06 Jul 2024
Towards Efficient Pixel Labeling for Industrial Anomaly Detection and
  Localization
Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization
Hanxi Li
Jingqi Wu
Lin Yuanbo Wu
Hao Chen
Deyin Liu
Chunhua Shen
26
0
0
03 Jul 2024
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Henghui Ding
Chang Liu
Yunchao Wei
Nikhila Ravi
Shuting He
...
Bo-Lu Zhao
Jing Liu
Feiyu Pan
Hao Fang
Xiankai Lu
48
8
0
24 Jun 2024
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring
  Video Object Segmentation
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Ci-Siang Lin
I-Jieh Liu
Min-Hung Chen
Chien-Yi Wang
Sifei Liu
Yu-Chiang Frank Wang
VOS
53
0
0
18 Jun 2024
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Weize Li
Zhicheng Zhao
Haochen Bai
Fei Su
38
0
0
24 May 2024
Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for
  Referring Image Segmentation
Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation
Yichen Yan
Xingjian He
Sihan Chen
Shichen Lu
Jing Liu
18
0
0
18 May 2024
HARIS: Human-Like Attention for Reference Image Segmentation
HARIS: Human-Like Attention for Reference Image Segmentation
Mengxi Zhang
Heqing Lian
Yiming Liu
Jie Chen
VLM
21
0
0
17 May 2024
Optimizing Universal Lesion Segmentation: State Space Model-Guided
  Hierarchical Networks with Feature Importance Adjustment
Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment
Kazi Shahriar Sanjid
Md. Tanzim Hossain
Md. Shakib Shahariar Junayed
M. M. Uddin
Mamba
35
2
0
26 Apr 2024
Mitigating the Curse of Dimensionality for Certified Robustness via Dual
  Randomized Smoothing
Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing
Song Xia
Yu Yi
Xudong Jiang
Henghui Ding
31
9
0
15 Apr 2024
Calibration & Reconstruction: Deep Integrated Language for Referring
  Image Segmentation
Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
Yichen Yan
Xingjian He
Sihan Chen
Jing Liu
ObjD
31
1
0
12 Apr 2024
CoReS: Orchestrating the Dance of Reasoning and Segmentation
CoReS: Orchestrating the Dance of Reasoning and Segmentation
Xiaoyi Bao
Siyang Sun
Shuailei Ma
Kecheng Zheng
Yuxin Guo
Guosheng Zhao
Yun Zheng
Xingang Wang
LRM
36
7
0
08 Apr 2024
Decoupling Static and Hierarchical Motion Perception for Referring Video
  Segmentation
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Shuting He
Henghui Ding
VOS
35
23
0
04 Apr 2024
Deep Instruction Tuning for Segment Anything Model
Deep Instruction Tuning for Segment Anything Model
Xiaorui Huang
Gen Luo
Chaoyang Zhu
Bo Tong
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
VLM
49
1
0
31 Mar 2024
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and
  mmWave Radar
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
Runwei Guan
Liye Jia
Fengyufan Yang
Shanliang Yao
Erick Purwanto
...
Eng Gee Lim
Jeremy S. Smith
Ka Lok Man
Xuming Hu
Yutao Yue
34
9
0
19 Mar 2024
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video
  Object Segmentation
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
Zixin Zhu
Xuelu Feng
Dongdong Chen
Junsong Yuan
Chunming Qiao
Gang Hua
DiffM
37
7
0
18 Mar 2024
OMG-Seg: Is One Model Good Enough For All Segmentation?
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li
Haobo Yuan
Wei Li
Henghui Ding
Size Wu
Wenwei Zhang
Yining Li
Kai Chen
Chen Change Loy
VLM
MLLM
ViT
71
52
0
18 Jan 2024
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi-Xin Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
34
17
0
25 Dec 2023
EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature
  Refinement and Regularized Image-Text Alignment
EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment
M. Lavrenyuk
Shariq Farooq Bhat
Matthias Müller
Peter Wonka
ObjD
MDE
29
9
0
13 Dec 2023
See, Say, and Segment: Teaching LMMs to Overcome False Premises
See, Say, and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu
Giscard Biamby
David M. Chan
Lisa Dunlap
Ritwik Gupta
Xudong Wang
Joseph E. Gonzalez
Trevor Darrell
VLM
MLLM
34
18
0
13 Dec 2023
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging
  Cross-Modal Attention with Large Language Models
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models
Haicheng Liao
Huanming Shen
Zhenning Li
Chengyue Wang
Guofa Li
Yiming Bie
Chengzhong Xu
34
50
0
06 Dec 2023
Unveiling Objects with SOLA: An Annotation-Free Image Search on the
  Object Level for Automotive Data Sets
Unveiling Objects with SOLA: An Annotation-Free Image Search on the Object Level for Automotive Data Sets
Philipp Rigoll
Jacob Langner
Eric Sax
31
3
0
04 Dec 2023
Text and Click inputs for unambiguous open vocabulary instance
  segmentation
Text and Click inputs for unambiguous open vocabulary instance segmentation
Nikolai Warner
Meera Hahn
Jonathan Huang
Irfan Essa
Vighnesh Birodkar
VLM
11
0
0
24 Nov 2023
Multi-View Spectrogram Transformer for Respiratory Sound Classification
Multi-View Spectrogram Transformer for Respiratory Sound Classification
Wentao He
Yuchen Yan
Jianfeng Ren
Ruibin Bai
Xudong Jiang
MedIm
ViT
17
7
0
16 Nov 2023
VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search
VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search
Shuting He
Hao Luo
Wei Jiang
Xudong Jiang
Henghui Ding
11
38
0
13 Nov 2023
Towards Omni-supervised Referring Expression Segmentation
Towards Omni-supervised Referring Expression Segmentation
Minglang Huang
Yiyi Zhou
Gen Luo
Guannan Jiang
Weilin Zhuang
Xiaoshuai Sun
16
0
0
01 Nov 2023
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Kun-Yu Lin
Jia-Run Du
Yipeng Gao
Jiaming Zhou
Wei-Shi Zheng
42
14
0
27 Oct 2023
Tracking Anything with Decoupled Video Segmentation
Tracking Anything with Decoupled Video Segmentation
Ho Kei Cheng
Seoung Wug Oh
Brian L. Price
Alexander Schwing
Joon-Young Lee
VOS
VLM
32
121
0
07 Sep 2023
Region Generation and Assessment Network for Occluded Person
  Re-Identification
Region Generation and Assessment Network for Occluded Person Re-Identification
Shuting He
Weihua Chen
Kai Wang
Haowen Luo
F. Wang
Wei Jiang
Henghui Ding
22
35
0
07 Sep 2023
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Minheng Ni
Yabo Zhang
Kailai Feng
Xiaoming Li
Yiwen Guo
W. Zuo
DiffM
18
24
0
31 Aug 2023
GREC: Generalized Referring Expression Comprehension
GREC: Generalized Referring Expression Comprehension
Shuting He
Henghui Ding
Chang Liu
Xudong Jiang
ObjD
19
14
0
30 Aug 2023
12
Next