ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.09709
  4. Cited By
CATR: Combinatorial-Dependence Audio-Queried Transformer for
  Audio-Visual Video Segmentation

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

18 September 2023
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
    VOS
ArXivPDFHTML

Papers citing "CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation"

21 / 21 papers shown
Title
Audio Visual Segmentation Through Text Embeddings
Audio Visual Segmentation Through Text Embeddings
Kyungbok Lee
You Zhang
Z. Duan
89
0
0
22 Feb 2025
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
64
2
0
28 Oct 2023
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
71
113
0
11 Jul 2022
Local-Global Context Aware Transformer for Language-Guided Video
  Segmentation
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
73
78
0
18 Mar 2022
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng
Alex Schwing
Alexander Kirillov
VLM
ViT
179
1,527
0
13 Jul 2021
Associating Objects with Transformers for Video Object Segmentation
Associating Objects with Transformers for Video Object Segmentation
Zongxin Yang
Yunchao Wei
Yi Yang
79
290
0
04 Jun 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
163
881
0
26 Apr 2021
Localizing Visual Sounds the Hard Way
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
72
188
0
06 Apr 2021
Positive Sample Propagation along the Audio-Visual Event Line
Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Liang Zheng
Yiran Zhong
Shijie Hao
Meng Wang
64
101
0
01 Apr 2021
Learning Spatio-Temporal Transformer for Visual Tracking
Learning Spatio-Temporal Transformer for Visual Tracking
Bin Yan
Houwen Peng
Jianlong Fu
Dong Wang
Huchuan Lu
ViT
60
720
0
31 Mar 2021
Transformer Tracking
Transformer Tracking
Xin Chen
Bin Yan
Jiawen Zhu
Dong Wang
Xiaoyun Yang
Huchuan Lu
ViT
52
953
0
29 Mar 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
55
166
0
21 Jan 2021
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
  with Transformers
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng
Jiachen Lu
Hengshuang Zhao
Xiatian Zhu
Zekun Luo
...
Yanwei Fu
Jianfeng Feng
Tao Xiang
Philip Torr
Li Zhang
ViT
174
2,893
0
31 Dec 2020
TransTrack: Multiple Object Tracking with Transformer
TransTrack: Multiple Object Tracking with Transformer
Pei Sun
Jinkun Cao
Yi Jiang
Rufeng Zhang
Enze Xie
Zehuan Yuan
Changhu Wang
Ping Luo
ViT
VOT
306
576
0
31 Dec 2020
End-to-End Video Instance Segmentation with Transformers
End-to-End Video Instance Segmentation with Transformers
Yuqing Wang
Zhaoliang Xu
Xinlong Wang
Chunhua Shen
Baoshan Cheng
Hao Shen
Huaxia Xia
ViT
67
690
0
30 Nov 2020
Collaborative Video Object Segmentation by Multi-Scale
  Foreground-Background Integration
Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration
Zongxin Yang
Yunchao Wei
Yi Yang
VOS
109
165
0
13 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
191
5,046
0
08 Oct 2020
Multiple Sound Sources Localization from Coarse to Fine
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
53
156
0
13 Jul 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
351
13,002
0
26 May 2020
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
437
22,040
0
09 Dec 2016
CNN Architectures for Large-Scale Audio Classification
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
111
2,494
0
29 Sep 2016
1