ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.16318
  4. Cited By
Referred by Multi-Modality: A Unified Temporal Transformer for Video
  Object Segmentation

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

25 May 2023
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Hongyang Li
Yu Qiao
Hao Dong
Zhongjiang He
Peng Gao
    VOS
ArXivPDFHTML

Papers citing "Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation"

36 / 36 papers shown
Title
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Pengxiang Li
Shilin Yan
Joey Tsai
Renrui Zhang
Ruichuan An
Ziyu Guo
Xiaowei Gao
5
0
0
26 May 2025
Progressive Scaling Visual Object Tracking
Progressive Scaling Visual Object Tracking
Jack Hong
Shilin Yan
Zehao Xiao
Jiayin Cai
Xiaolong Jiang
Yao Hu
Henghui Ding
0
0
0
26 May 2025
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
Shilin Yan
Jiaming Han
Joey Tsai
Hongwei Xue
Rongyao Fang
Lingyi Hong
Ziyu Guo
Ray Zhang
VLM
19
0
0
22 May 2025
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
Haofeng Liu
Mingqi Gao
Xuxiao Luo
Ziyue Wang
Guanyi Qin
Jinlin Wu
Yueming Jin
50
0
0
13 May 2025
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching
Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching
Heng Liu
Guanghui Li
Mingqi Gao
Xiantong Zhen
Feng Zheng
Yansen Wang
VOS
76
0
0
18 Apr 2025
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation
Hao Fang
Runmin Cong
Xiankai Lu
Zheyu Chen
Wei Zhang
42
0
0
07 Apr 2025
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
Tianming Liang
Haichao Jiang
Wei-Shi Zheng
Jian-Fang Hu
51
0
0
30 Mar 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu
Liying Yang
Peike Li
Dadong Wang
Lincheng Li
Xin Yu
VOS
101
0
0
17 Mar 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
67
0
0
17 Mar 2025
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
Suhwan Cho
Seunghoon Lee
Minhyeok Lee
Jungho Lee
Sangyoun Lee
VOS
106
0
0
05 Mar 2025
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong
Meng Lan
Qian Zhang
Lefei Zhang
VOS
VGen
78
1
0
23 Jan 2025
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Zhiyong Yang
Pingping Zhang
Huchuan Lu
49
1
0
15 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
114
115
0
10 Jan 2025
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
119
2
0
26 Nov 2024
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi
Zhipeng Bao
Yu-Xiong Wang
P. Tokmakov
Martial Hebert
VOS
47
0
0
30 Oct 2024
General Compression Framework for Efficient Transformer Object Tracking
General Compression Framework for Efficient Transformer Object Tracking
Lingyi Hong
Jinglun Li
Xinyu Zhou
Shilin Yan
Pinxue Guo
...
Zhaoyu Chen
Shuyong Gao
Wei Zhang
Hong Lu
Wenqiang Zhang
ViT
46
1
0
26 Sep 2024
LSVOS Challenge Report: Large-scale Complex and Long Video Object
  Segmentation
LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation
Henghui Ding
Lingyi Hong
Chang Liu
Ning Xu
L. Yang
...
Bin Cao
Yisi Zhang
Hanyi Wang
Xingjian He
Jing Liu
VOS
49
2
0
09 Sep 2024
The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal
  Refinement for Consistent Semantic Segmentation
The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation
Tuyen Tran
53
2
0
22 Aug 2024
UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track
UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track
Hao Fang
Feiyu Pan
Xiankai Lu
Wei Zhang
Runmin Cong
63
3
0
19 Aug 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOS
LRM
80
3
0
18 Jul 2024
Stepping Stones: A Progressive Training Strategy for Audio-Visual
  Semantic Segmentation
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma
Peiwen Sun
Yaoting Wang
Di Hu
VOS
80
7
0
16 Jul 2024
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang
Peiwen Sun
Dongzhan Zhou
Guangyao Li
Honggang Zhang
Di Hu
VOS
72
5
0
15 Jul 2024
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Yaoting Wang
Peiwen Sun
Yuanchao Li
Honggang Zhang
Di Hu
74
5
0
15 Jul 2024
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Henghui Ding
Chang Liu
Yunchao Wei
Nikhila Ravi
Shuting He
...
Bo Zhao
Jing Liu
Feiyu Pan
Hao Fang
Xiankai Lu
68
8
0
24 Jun 2024
2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion
  Expression guided Video Segmentation
2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
Bin Cao
Yisi Zhang
Xuanxu Lin
Xingjian He
Bo Zhao
Jing Liu
89
2
0
20 Jun 2024
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring
  Video Object Segmentation
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Ci-Siang Lin
I-Jieh Liu
Min-Hung Chen
Chien-Yi Wang
Sifei Liu
Yu-Chiang Frank Wang
VOS
65
0
0
18 Jun 2024
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion
  Expression guided Video Segmentation
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
Mingqi Gao
Jingnan Luo
Jinyu Yang
Jungong Han
Feng Zheng
47
2
0
11 Jun 2024
SC-HVPPNet: Spatial and Channel Hybrid-Attention Video Post-Processing
  Network with CNN and Transformer
SC-HVPPNet: Spatial and Channel Hybrid-Attention Video Post-Processing Network with CNN and Transformer
Tong Zhang
Wenxue Cui
Shao-Bin Liu
Feng Jiang
44
1
0
23 Apr 2024
OneTracker: Unifying Visual Object Tracking with Foundation Models and
  Efficient Tuning
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong
Shilin Yan
Renrui Zhang
Wanyun Li
Xinyu Zhou
...
Kaixun Jiang
Yiting Chen
Jinglun Li
Zhaoyu Chen
Wenqiang Zhang
VLM
41
43
0
14 Mar 2024
1st Place Solution for 5th LSVOS Challenge: Referring Video Object
  Segmentation
1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation
Zhuoyan Luo
Yicheng Xiao
Yong Liu
Yitong Wang
Yansong Tang
Xiu Li
Yujiu Yang
VOS
43
2
0
01 Jan 2024
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for
  Video Segmentation
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
Shilin Yan
Xiaohao Xu
Renrui Zhang
Lingyi Hong
Wenchao Chen
Wenqiang Zhang
Wei Zhang
VOS
39
8
0
21 Sep 2023
OnlineRefer: A Simple Online Baseline for Referring Video Object
  Segmentation
OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
Dongming Wu
Tiancai Wang
Yuang Zhang
Xiangyu Zhang
Jianbing Shen
VOS
61
34
0
18 Jul 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and
  Segmentation
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
92
49
0
03 May 2023
1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object
  Segmentation
1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation
Zhiwei Hu
Bo Chen
Yuan Gao
Zhilong Ji
Jinfeng Bai
VOS
63
5
0
27 Dec 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip Torr
152
314
0
04 Dec 2021
Multi-task Collaborative Network for Joint Referring Expression
  Comprehension and Segmentation
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
196
288
0
19 Mar 2020
1