Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.09709
Cited By
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
18 September 2023
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation"
21 / 21 papers shown
Title
Audio Visual Segmentation Through Text Embeddings
Kyungbok Lee
You Zhang
Z. Duan
89
0
0
22 Feb 2025
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
64
2
0
28 Oct 2023
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
71
113
0
11 Jul 2022
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
73
78
0
18 Mar 2022
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng
Alex Schwing
Alexander Kirillov
VLM
ViT
179
1,527
0
13 Jul 2021
Associating Objects with Transformers for Video Object Segmentation
Zongxin Yang
Yunchao Wei
Yi Yang
79
290
0
04 Jun 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
163
881
0
26 Apr 2021
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
72
188
0
06 Apr 2021
Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Liang Zheng
Yiran Zhong
Shijie Hao
Meng Wang
64
101
0
01 Apr 2021
Learning Spatio-Temporal Transformer for Visual Tracking
Bin Yan
Houwen Peng
Jianlong Fu
Dong Wang
Huchuan Lu
ViT
60
720
0
31 Mar 2021
Transformer Tracking
Xin Chen
Bin Yan
Jiawen Zhu
Dong Wang
Xiaoyun Yang
Huchuan Lu
ViT
52
953
0
29 Mar 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
55
166
0
21 Jan 2021
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng
Jiachen Lu
Hengshuang Zhao
Xiatian Zhu
Zekun Luo
...
Yanwei Fu
Jianfeng Feng
Tao Xiang
Philip Torr
Li Zhang
ViT
174
2,893
0
31 Dec 2020
TransTrack: Multiple Object Tracking with Transformer
Pei Sun
Jinkun Cao
Yi Jiang
Rufeng Zhang
Enze Xie
Zehuan Yuan
Changhu Wang
Ping Luo
ViT
VOT
306
576
0
31 Dec 2020
End-to-End Video Instance Segmentation with Transformers
Yuqing Wang
Zhaoliang Xu
Xinlong Wang
Chunhua Shen
Baoshan Cheng
Hao Shen
Huaxia Xia
ViT
67
690
0
30 Nov 2020
Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration
Zongxin Yang
Yunchao Wei
Yi Yang
VOS
109
165
0
13 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
191
5,046
0
08 Oct 2020
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
53
156
0
13 Jul 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
351
13,002
0
26 May 2020
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
437
22,040
0
09 Dec 2016
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
111
2,494
0
29 Sep 2016
1