Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.09324
Cited By
Localizing Visual Sounds the Easy Way
17 March 2022
Shentong Mo
Pedro Morgado
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Localizing Visual Sounds the Easy Way"
50 / 62 papers shown
Title
Learning to Highlight Audio by Watching Movies
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
12
0
0
17 May 2025
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
55
0
0
08 May 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
50
0
0
30 Apr 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
22
0
0
21 Apr 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
65
0
0
17 Mar 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du
Guangyao Li
Chang Zhou
Chunjie Zhang
Alan Zhao
D. Hu
59
0
0
17 Mar 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
55
0
0
15 Mar 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
42
0
0
04 Mar 2025
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
60
4
0
18 Nov 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Shentong Mo
Yibing Song
25
0
0
30 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
54
3
0
03 Oct 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
Xavier Juanola
Gloria Haro
Magdalena Fuentes
36
2
0
01 Oct 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
33
2
0
31 Aug 2024
Unveiling Visual Biases in Audio-Visual Localization Benchmarks
Liangyu Chen
Zihao Yue
Boshen Xu
Qin Jin
SSL
57
0
0
25 Aug 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
38
3
0
18 Jul 2024
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang
Dejan Marković
Chenliang Xu
Alexander Richard
33
5
0
18 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
48
2
0
07 Jul 2024
Semantic Grouping Network for Audio Source Separation
Shentong Mo
Yapeng Tian
36
4
0
04 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
31
2
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
46
9
0
01 Jul 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
34
4
0
07 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
56
0
0
04 Jun 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
50
9
0
20 May 2024
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
35
2
0
12 May 2024
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud
Yapeng Tian
Diana Marculescu
42
8
0
02 Apr 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
46
4
0
26 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Jiangkang Deng
Xiatian Zhu
VOS
43
5
0
21 Mar 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
42
18
0
08 Mar 2024
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Hu Su
Zhiqing Wang
Yuhao Zhao
Wei Zou
Siyang Sun
Yun Zheng
SSL
51
12
0
05 Mar 2024
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Yuhao Zhao
Hu Su
Wei Zou
47
4
0
05 Mar 2024
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
Adrian S. Roman
Baladithya Balamurugan
Rithik Pothuganti
30
5
0
29 Jan 2024
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
27
13
0
02 Dec 2023
Weakly-Supervised Audio-Visual Segmentation
Shentong Mo
Bhiksha Raj
VOS
51
12
0
25 Nov 2023
Can CLIP Help Sound Source Localization?
Sooyoung Park
Arda Senocak
Joon Son Chung
35
7
0
07 Nov 2023
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Xudong Xu
Dejan Marković
Jacob Sandakly
Todd Keebler
Steven Krenn
Alexander Richard
20
2
0
01 Nov 2023
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li
Jinglu Wang
Xiaohao Xu
Xiulian Peng
Rita Singh
Yan Lu
Bhiksha Raj
VOS
39
10
0
29 Sep 2023
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
36
18
0
19 Sep 2023
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang
Weisong Liu
Guangyao Li
Jian Ding
Di Hu
Xi Li
VLM
26
18
0
13 Sep 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
43
23
0
11 Sep 2023
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
33
28
0
21 Aug 2023
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
Chen Liu
Peike Li
Hu Zhang
Lincheng Li
Zi Huang
Dadong Wang
Xin Yu
VOS
45
25
0
20 Aug 2023
Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
Chen Liu
Peike Li
Xingqun Qi
Hu Zhang
Lincheng Li
Dadong Wang
Xin Yu
VOS
45
30
0
31 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
36
4
0
10 Jul 2023
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
38
21
0
30 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
20
17
0
22 May 2023
Connecting Multi-modal Contrastive Representations
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
30
22
0
22 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
43
90
0
14 May 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
90
49
0
03 May 2023
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen
Yuyuan Liu
Hu Wang
Fengbei Liu
Chong Wang
Helen Frazer
G. Carneiro
VOS
27
15
0
06 Apr 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo
Yapeng Tian
45
42
0
29 Mar 2023
1
2
Next