Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.15058
Cited By
Mix and Localize: Localizing Sound Sources in Mixtures
28 November 2022
Xixi Hu
Ziyang Chen
Andrew Owens
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mix and Localize: Localizing Sound Sources in Mixtures"
46 / 46 papers shown
Title
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
26
0
0
14 May 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
48
0
0
30 Apr 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
42
0
0
04 Mar 2025
Enhancing Explainability with Multimodal Context Representations for Smarter Robots
Anargh Viswanath
Lokesh Veeramacheneni
Hendrik Buschmeier
46
0
0
28 Feb 2025
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
75
0
0
24 Nov 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Shentong Mo
Yibing Song
25
0
0
30 Oct 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
Xavier Juanola
Gloria Haro
Magdalena Fuentes
36
2
0
01 Oct 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
33
2
0
31 Aug 2024
Enhancing Sound Source Localization via False Negative Elimination
Zengjie Song
Jiangshe Zhang
Yuxi Wang
Junsong Fan
Zhaoxiang Zhang
43
0
0
29 Aug 2024
Unveiling Visual Biases in Audio-Visual Localization Benchmarks
Liangyu Chen
Zihao Yue
Boshen Xu
Qin Jin
SSL
49
0
0
25 Aug 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
45
2
0
07 Jul 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
34
4
0
07 Jun 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
50
9
0
20 May 2024
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Yuanyuan Jiang
Jianqin Yin
45
1
0
13 May 2024
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud
Yapeng Tian
Diana Marculescu
42
8
0
02 Apr 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
46
4
0
26 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Jiangkang Deng
Xiatian Zhu
VOS
43
5
0
21 Mar 2024
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Hu Su
Zhiqing Wang
Yuhao Zhao
Wei Zou
Siyang Sun
Yun Zheng
SSL
51
12
0
05 Mar 2024
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Yuhao Zhao
Hu Su
Wei Zou
47
4
0
05 Mar 2024
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang
Chao Feng
Ziyang Chen
Hyoungseob Park
Daniel Wang
...
Ziyao Zeng
Xien Chen
Rit Gangopadhyay
Andrew Owens
Alex Wong
42
59
0
31 Jan 2024
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
27
13
0
02 Dec 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
33
9
0
25 Oct 2023
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment
Jaewoo Lee
Jaehong Yoon
Wonjae Kim
Yunji Kim
Sung Ju Hwang
CLL
19
1
0
12 Oct 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
37
23
0
11 Sep 2023
Learning to Taste: A Multimodal Wine Dataset
Thoranna Bender
Simon Moe Sorensen
A. Kashani
K. E. Hjorleifsson
Grethe Hyldig
Søren Hauberg
Serge Belongie
Frederik Warburg
CoGe
30
2
0
31 Aug 2023
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
33
28
0
21 Aug 2023
Improving Audio-Visual Segmentation with Bidirectional Generation
Dawei Hao
Yuxin Mao
Bowen He
Xiaodong Han
Yuchao Dai
Yiran Zhong
VOS
VGen
36
30
0
16 Aug 2023
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
Tianyu Liu
Peng Zhang
Wei Huang
Yufei Zha
Tao You
Yanni Zhang
SSL
25
2
0
09 Aug 2023
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
38
21
0
30 May 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
26
10
0
27 May 2023
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios
Yuanyuan Jiang
Jianqin Yin
21
7
0
21 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
32
8
0
06 May 2023
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen
Yuyuan Liu
Hu Wang
Fengbei Liu
Chong Wang
Helen Frazer
G. Carneiro
VOS
27
15
0
06 Apr 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo
Yapeng Tian
45
42
0
29 Mar 2023
Language-Guided Audio-Visual Source Separation via Trimodal Consistency
Reuben Tan
Arijit Ray
Andrea Burns
Bryan A. Plummer
Justin Salamon
Oriol Nieto
Bryan C. Russell
Kate Saenko
23
21
0
28 Mar 2023
Egocentric Audio-Visual Object Localization
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
29
30
0
23 Mar 2023
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen
Shengyi Qian
Andrew Owens
26
12
0
20 Mar 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Chao Feng
Ziyang Chen
Andrew Owens
31
71
0
04 Jan 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
34
73
0
15 Dec 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
27
19
0
26 Apr 2022
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
171
84
0
04 May 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
257
2,233
0
14 Jun 2018
Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Daniel Stoller
Sebastian Ewert
S. Dixon
AI4TS
104
589
0
08 Jun 2018
1