ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.09324
  4. Cited By
Localizing Visual Sounds the Easy Way

Localizing Visual Sounds the Easy Way

17 March 2022
Shentong Mo
Pedro Morgado
ArXivPDFHTML

Papers citing "Localizing Visual Sounds the Easy Way"

50 / 62 papers shown
Title
Learning to Highlight Audio by Watching Movies
Learning to Highlight Audio by Watching Movies
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
12
0
0
17 May 2025
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
55
0
0
08 May 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
50
0
0
30 Apr 2025
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim
Youngkil Song
Jicheol Park
Won Hwa Kim
Suha Kwak
22
0
0
21 Apr 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
65
0
0
17 Mar 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du
Guangyao Li
Chang Zhou
Chunjie Zhang
Alan Zhao
D. Hu
59
0
0
17 Mar 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
55
0
0
15 Mar 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
42
0
0
04 Mar 2025
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
60
4
0
18 Nov 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Shentong Mo
Yibing Song
25
0
0
30 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
54
3
0
03 Oct 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
Xavier Juanola
Gloria Haro
Magdalena Fuentes
36
2
0
01 Oct 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
33
2
0
31 Aug 2024
Unveiling Visual Biases in Audio-Visual Localization Benchmarks
Unveiling Visual Biases in Audio-Visual Localization Benchmarks
Liangyu Chen
Zihao Yue
Boshen Xu
Qin Jin
SSL
57
0
0
25 Aug 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through
  Audio-Visual Alignment
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
38
3
0
18 Jul 2024
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang
Dejan Marković
Chenliang Xu
Alexander Richard
33
5
0
18 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
48
2
0
07 Jul 2024
Semantic Grouping Network for Audio Source Separation
Semantic Grouping Network for Audio Source Separation
Shentong Mo
Yapeng Tian
36
4
0
04 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
31
2
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and
  Time
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
46
9
0
01 Jul 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual
  Transformers
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
34
4
0
07 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
56
0
0
04 Jun 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
50
9
0
20 May 2024
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
35
2
0
12 May 2024
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud
Yapeng Tian
Diana Marculescu
42
8
0
02 Apr 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior
  Source Knowledge
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
46
4
0
26 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Jiangkang Deng
Xiatian Zhu
VOS
43
5
0
21 Mar 2024
Text-to-Audio Generation Synchronized with Videos
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
42
18
0
08 Mar 2024
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for
  Audio-Visual Source Localization
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Hu Su
Zhiqing Wang
Yuhao Zhao
Wei Zou
Siyang Sun
Yun Zheng
SSL
51
12
0
05 Mar 2024
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source
  Localization
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Yuhao Zhao
Hu Su
Wei Zou
47
4
0
05 Mar 2024
Enhanced Sound Event Localization and Detection in Real 360-degree
  audio-visual soundscapes
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
Adrian S. Roman
Baladithya Balamurugan
Rithik Pothuganti
30
5
0
29 Jan 2024
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense
  Interactions through Masked Modeling
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
27
13
0
02 Dec 2023
Weakly-Supervised Audio-Visual Segmentation
Weakly-Supervised Audio-Visual Segmentation
Shentong Mo
Bhiksha Raj
VOS
51
12
0
25 Nov 2023
Can CLIP Help Sound Source Localization?
Can CLIP Help Sound Source Localization?
Sooyoung Park
Arda Senocak
Joon Son Chung
35
7
0
07 Nov 2023
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and
  Audio
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Xudong Xu
Dejan Marković
Jacob Sandakly
Todd Keebler
Steven Krenn
Alexander Richard
20
2
0
01 Nov 2023
QDFormer: Towards Robust Audiovisual Segmentation in Complex
  Environments with Quantization-based Semantic Decomposition
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li
Jinglu Wang
Xiaohao Xu
Xiulian Peng
Rita Singh
Yan Lu
Bhiksha Raj
VOS
39
10
0
29 Sep 2023
Sound Source Localization is All about Cross-Modal Alignment
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
36
18
0
19 Sep 2023
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source
  Localizer
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang
Weisong Liu
Guangyao Li
Jian Ding
Di Hu
Xi Li
VLM
26
18
0
13 Sep 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
43
23
0
11 Sep 2023
Audio-Visual Class-Incremental Learning
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
33
28
0
21 Aug 2023
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation
  Knowledge
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
Chen Liu
Peike Li
Hu Zhang
Lincheng Li
Zi Huang
Dadong Wang
Xin Yu
VOS
45
25
0
20 Aug 2023
Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
Chen Liu
Peike Li
Xingqun Qi
Hu Zhang
Lincheng Li
Dadong Wang
Xin Yu
VOS
45
30
0
31 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric
  Videos
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
36
4
0
10 Jul 2023
A Unified Audio-Visual Learning Framework for Localization, Separation,
  and Recognition
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
38
21
0
30 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
20
17
0
22 May 2023
Connecting Multi-modal Contrastive Representations
Connecting Multi-modal Contrastive Representations
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
30
22
0
22 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
43
90
0
14 May 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and
  Segmentation
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
90
49
0
03 May 2023
Unraveling Instance Associations: A Closer Look for Audio-Visual
  Segmentation
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen
Yuyuan Liu
Hu Wang
Fengbei Liu
Chong Wang
Helen Frazer
G. Carneiro
VOS
27
15
0
06 Apr 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo
Yapeng Tian
45
42
0
29 Mar 2023
12
Next