Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.01518
Cited By
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
2 January 2025
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation"
16 / 16 papers shown
Title
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLM
KELM
VLM
60
0
0
06 May 2025
Understanding Co-speech Gestures in-the-wild
Sindhu B. Hegde
KR Prajwal
Taein Kwon
Andrew Zisserman
SLR
57
0
0
28 Mar 2025
Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
Minsu Kim
Rodrigo Mira
Honglie Chen
Stavros Petridis
M. Pantic
69
0
0
13 Mar 2025
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
60
4
0
18 Nov 2024
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues
Tianrui Pan
Jie Liu
Bohan Wang
Jie Tang
Gangshan Wu
40
2
0
27 Jul 2024
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
47
4
0
13 Jun 2024
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
Yuxin Ye
Wenming Yang
Yapeng Tian
34
10
0
31 Oct 2023
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Yiyang Su
A. Vosoughi
Shijian Deng
Yapeng Tian
Chenliang Xu
26
4
0
18 Oct 2023
IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation
Kai Li
Run Yang
Fuchun Sun
Xiaolin Hu
32
6
0
16 Aug 2023
Speech inpainting: Context-based speech synthesis guided by video
Juan F. Montesinos
Daniel Michelsanti
G. Haro
Zheng-Hua Tan
Jesper Jensen
21
3
0
01 Jun 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Wangyou Zhang
Y. Qian
38
10
0
25 May 2023
Language-Guided Audio-Visual Source Separation via Trimodal Consistency
Reuben Tan
Arijit Ray
Andrea Burns
Bryan A. Plummer
Justin Salamon
Oriol Nieto
Bryan C. Russell
Kate Saenko
23
21
0
28 Mar 2023
Target Sound Extraction with Variable Cross-modality Clues
Chenda Li
Yao Qian
Zhuo Chen
Dongmei Wang
Takuya Yoshioka
Shujie Liu
Y. Qian
Michael Zeng
VLM
29
13
0
15 Mar 2023
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
Interspeech 2021 Deep Noise Suppression Challenge
Chandan K. A. Reddy
Harishchandra Dubey
K. Koishida
A. Nair
Vishak Gopal
Ross Cutler
Sebastian Braun
H. Gamper
R. Aichner
Sriram Srinivasan
AI4CE
80
160
0
06 Jan 2021
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
1