ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.09773
  4. Cited By
Local-Global Context Aware Transformer for Language-Guided Video
  Segmentation

Local-Global Context Aware Transformer for Language-Guided Video Segmentation

18 March 2022
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
    VOS
ArXivPDFHTML

Papers citing "Local-Global Context Aware Transformer for Language-Guided Video Segmentation"

35 / 35 papers shown
Title
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
X. Li
Zilong Huang
Y. Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLM
LRM
60
2
0
14 Apr 2025
CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
Masud Ahmed
Zahid Hasan
Syed Arefinul Haque
A. Faridee
S. Purushotham
Suya You
Nirmalya Roy
53
0
0
19 Mar 2025
Image Segmentation in Foundation Model Era: A Survey
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
42
4
0
23 Aug 2024
General and Task-Oriented Video Segmentation
General and Task-Oriented Video Segmentation
Mu Chen
Liulei Li
Wenguan Wang
Ruijie Quan
Yi Yang
VOS
59
4
0
09 Jul 2024
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring
  Video Object Segmentation
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Ci-Siang Lin
I-Jieh Liu
Min-Hung Chen
Chien-Yi Wang
Sifei Liu
Yu-Chiang Frank Wang
VOS
53
0
0
18 Jun 2024
Deep learning-based blind image super-resolution with iterative kernel
  reconstruction and noise estimation
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation
H. Ateş
S. Yildirim
B. Gunturk
SupR
22
16
0
25 Apr 2024
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
26
23
0
19 Jan 2024
End-to-end Video Gaze Estimation via Capturing Head-face-eye
  Spatial-temporal Interaction Context
End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context
Yiran Guan
Zhuoguang Chen
Wenzheng Zeng
Zhiguo Cao
Yang Xiao
CVBM
40
15
0
27 Oct 2023
Efficient Long-Short Temporal Attention Network for Unsupervised Video
  Object Segmentation
Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation
P. Li
Yu Zhang
L. Yuan
Huaxin Xiao
Binbin Lin
Xianghua Xu
VOS
26
17
0
21 Sep 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for
  Audio-Visual Video Segmentation
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
39
51
0
18 Sep 2023
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual
  Tracking and Segmentation
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation
Yuanyou Xu
Zongxin Yang
Yi Yang
VOS
47
6
0
25 Aug 2023
Logic-induced Diagnostic Reasoning for Semi-supervised Semantic
  Segmentation
Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation
Chen Liang
Wenguan Wang
Jiaxu Miao
Yi Yang
NAI
25
29
0
24 Aug 2023
The Staged Knowledge Distillation in Video Classification: Harmonizing
  Student Progress by a Complementary Weakly Supervised Framework
The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework
Chao Wang
Zhenghang Tang
27
1
0
11 Jul 2023
HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with
  Cross-person Memory Transformer
HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer
Y. Kim
Dong Won Lee
Paul Pu Liang
Sharifa Alghowinem
C. Breazeal
Hae Won Park
32
4
0
21 May 2023
Segment and Track Anything
Segment and Track Anything
Yangming Cheng
Liulei Li
Yuanyou Xu
Xiaodi Li
Zongxin Yang
Wenguan Wang
Yi Yang
VOS
28
193
0
11 May 2023
Global-to-Local Modeling for Video-based 3D Human Pose and Shape
  Estimation
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
Xi Shen
Zongxin Yang
Xiaohan Wang
Jianxin Ma
Chang Zhou
Yezhou Yang
ViT
3DH
21
33
0
26 Mar 2023
Lana: A Language-Capable Navigator for Instruction Following and
  Generation
Lana: A Language-Capable Navigator for Instruction Following and Generation
Xiaohan Wang
Wenguan Wang
Jiayi Shao
Yi Yang
LLMAG
LM&Ro
36
38
0
15 Mar 2023
Referring Multi-Object Tracking
Referring Multi-Object Tracking
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
26
71
0
06 Mar 2023
Adaptively Clustering Neighbor Elements for Image-Text Generation
Adaptively Clustering Neighbor Elements for Image-Text Generation
Zihua Wang
Xu Yang
Hanwang Zhang
Haiyang Xu
Mingshi Yan
Feisi Huang
Yu Zhang
VLM
80
0
0
05 Jan 2023
Understanding and Mitigating Overfitting in Prompt Tuning for
  Vision-Language Models
Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
Cheng Ma
Yang Liu
Jiankang Deng
Lingxi Xie
Weiming Dong
Changsheng Xu
VLM
VPVLM
26
43
0
04 Nov 2022
Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on
  Neuro-Symbolic Computing
Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing
Wenguan Wang
Yi Yang
Fei Wu
NAI
32
16
0
28 Oct 2022
Decoupling Features in Hierarchical Propagation for Video Object
  Segmentation
Decoupling Features in Hierarchical Propagation for Video Object Segmentation
Zongxin Yang
Yi Yang
VOS
16
152
0
18 Oct 2022
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for
  Text-to-Image Generation
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation
Rui Li
Weihua Li
Yi Yang
Hanyu Wei
Jianhua Jiang
Quan-wei Bai
DiffM
24
11
0
18 Oct 2022
Towards Robust Referring Image Segmentation
Towards Robust Referring Image Segmentation
Jianzong Wu
Xiangtai Li
Xia Li
Henghui Ding
Yu Tong
Dacheng Tao
3DV
29
40
0
20 Sep 2022
The Second Place Solution for The 4th Large-scale Video Object
  Segmentation Challenge--Track 3: Referring Video Object Segmentation
The Second Place Solution for The 4th Large-scale Video Object Segmentation Challenge--Track 3: Referring Video Object Segmentation
Leilei Cao
Zhuang Li
Bo Yan
Feng Zhang
Fengliang Qi
Yucheng Hu
Hongbin Wang
VOS
11
1
0
24 Jun 2022
Scalable Video Object Segmentation with Identification Mechanism
Scalable Video Object Segmentation with Identification Mechanism
Zongxin Yang
Jiaxu Miao
Yunchao Wei
Wenguan Wang
Xiaohan Wang
Yi Yang
VOS
36
23
0
22 Mar 2022
Space Time Recurrent Memory Network
Space Time Recurrent Memory Network
Hung-Cuong Nguyen
Chanho Kim
Fuxin Li
23
3
0
14 Sep 2021
A Survey on Deep Learning Technique for Video Segmentation
A Survey on Deep Learning Technique for Video Segmentation
Tianfei Zhou
Fatih Porikli
David J. Crandall
Luc Van Gool
Wenguan Wang
VOS
25
231
0
02 Jul 2021
Rethinking Cross-modal Interaction from a Top-down Perspective for
  Referring Video Object Segmentation
Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation
Chen Liang
Yu Wu
Tianfei Zhou
Wenguan Wang
Zongxin Yang
Yunchao Wei
Yi Yang
VOS
16
49
0
02 Jun 2021
ClawCraneNet: Leveraging Object-level Relation for Text-based Video
  Segmentation
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
Chen Liang
Yu Wu
Yawei Luo
Yi Yang
VOS
20
30
0
19 Mar 2021
Collaborative Video Object Segmentation by Multi-Scale
  Foreground-Background Integration
Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration
Zongxin Yang
Yunchao Wei
Yi Yang
VOS
35
163
0
13 Oct 2020
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
415
595
0
21 Jul 2020
Multi-task Collaborative Network for Joint Referring Expression
  Comprehension and Segmentation
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
170
286
0
19 Mar 2020
Stanza: A Python Natural Language Processing Toolkit for Many Human
  Languages
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
201
1,653
0
16 Mar 2020
A Real-Time Cross-modality Correlation Filtering Method for Referring
  Expression Comprehension
A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
Yue Liao
Si Liu
Guanbin Li
Fei-Yue Wang
Yanjie Chen
Chao Qian
Bo-wen Li
ObjD
62
174
0
16 Sep 2019
1