Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.12725
Cited By
Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction
19 April 2024
Zhaoxi Mu
Xinyu Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction"
7 / 7 papers shown
Title
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLM
KELM
VLM
57
0
0
06 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
120
0
0
06 May 2025
Distance Based Single-Channel Target Speech Extraction
Runwu Shi
Benjamin Yen
Kazuhiro Nakadai
33
0
0
31 Dec 2024
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
Donghang Wu
Yiwen Wang
Xihong Wu
T. Qu
Mamba
32
3
0
07 Sep 2024
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
Vahid Ahmadi Kalkhorani
Cheng Yu
Anurag Kumar
Ke Tan
Buye Xu
DeLiang Wang
32
0
0
17 Jun 2024
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
190
198
0
08 Jan 2021
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
227
2,233
0
14 Jun 2018
1