Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.11820
Cited By
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
16 July 2024
Juncheng Ma
Peiwen Sun
Yaoting Wang
Di Hu
VOS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation"
19 / 19 papers shown
Title
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang
Weisong Liu
Guangyao Li
Jian Ding
Di Hu
Xi Li
VLM
68
21
0
13 Sep 2023
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yun-Qiu Lv
Yiran Zhong
Yuchao Dai
DiffM
92
29
0
31 Jul 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Hongyang Li
Yu Qiao
Hao Dong
Zhongjiang He
Peng Gao
VOS
111
37
0
25 May 2023
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
87
116
0
11 Jul 2022
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Zengjie Song
Yuxi Wang
Junsong Fan
Tieniu Tan
Zhaoxiang Zhang
SSL
67
43
0
25 Mar 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
61
41
0
22 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
272
2,385
0
02 Dec 2021
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng
Alex Schwing
Alexander Kirillov
VLM
ViT
212
1,551
0
13 Jul 2021
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
88
191
0
06 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
467
21,603
0
25 Mar 2021
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
64
157
0
13 Jul 2020
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
100
754
0
10 Apr 2018
Learning to Localize Sound Source in Visual Scenes
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
68
346
0
10 Mar 2018
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
114
530
0
18 Dec 2017
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
488
22,158
0
09 Dec 2016
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
130
2,510
0
29 Sep 2016
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
Fausto Milletari
Nassir Navab
Seyed-Ahmad Ahmadi
248
8,722
0
15 Jun 2016
Fully Convolutional Networks for Semantic Segmentation
Evan Shelhamer
Jonathan Long
Trevor Darrell
VOS
SSeg
750
37,895
0
20 May 2016
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.9K
77,441
0
18 May 2015
1