ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.11820
  4. Cited By
Stepping Stones: A Progressive Training Strategy for Audio-Visual
  Semantic Segmentation

Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation

16 July 2024
Juncheng Ma
Peiwen Sun
Yaoting Wang
Di Hu
    VOS
ArXiv (abs)PDFHTML

Papers citing "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation"

19 / 19 papers shown
Title
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source
  Localizer
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang
Weisong Liu
Guangyao Li
Jian Ding
Di Hu
Xi Li
VLM
68
21
0
13 Sep 2023
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yun-Qiu Lv
Yiran Zhong
Yuchao Dai
DiffM
92
29
0
31 Jul 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video
  Object Segmentation
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Hongyang Li
Yu Qiao
Hao Dong
Zhongjiang He
Peng Gao
VOS
111
37
0
25 May 2023
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
87
116
0
11 Jul 2022
Self-Supervised Predictive Learning: A Negative-Free Method for Sound
  Source Localization in Visual Scenes
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Zengjie Song
Yuxi Wang
Junsong Fan
Tieniu Tan
Zhaoxiang Zhang
SSL
67
43
0
25 Mar 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
61
41
0
22 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
272
2,385
0
02 Dec 2021
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng
Alex Schwing
Alexander Kirillov
VLMViT
212
1,551
0
13 Jul 2021
Localizing Visual Sounds the Hard Way
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
88
191
0
06 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
467
21,603
0
25 Mar 2021
Multiple Sound Sources Localization from Coarse to Fine
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
64
157
0
13 Jul 2020
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
100
754
0
10 Apr 2018
Learning to Localize Sound Source in Visual Scenes
Learning to Localize Sound Source in Visual Scenes
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
68
346
0
10 Mar 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjDVOS
114
530
0
18 Dec 2017
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
488
22,158
0
09 Dec 2016
CNN Architectures for Large-Scale Audio Classification
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
130
2,510
0
29 Sep 2016
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image
  Segmentation
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
Fausto Milletari
Nassir Navab
Seyed-Ahmad Ahmadi
248
8,722
0
15 Jun 2016
Fully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation
Evan Shelhamer
Jonathan Long
Trevor Darrell
VOSSSeg
750
37,895
0
20 May 2016
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg3DV
1.9K
77,441
0
18 May 2015
1