Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1510.08484
Cited By
MUSAN: A Music, Speech, and Noise Corpus
28 October 2015
David Snyder
Guoguo Chen
Daniel Povey
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MUSAN: A Music, Speech, and Noise Corpus"
22 / 22 papers shown
Title
Learning Emotion-Invariant Speaker Representations for Speaker Verification
Jingguang Tian
Xinhui Hu
Xinkang Xu
92
2
0
24 May 2025
Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting
Youngmoon Jung
Yong-Hyeok Lee
Myunghun Jung
Jaeyoung Roh
Chang Woo Han
Hoon-Young Cho
38
0
0
22 May 2025
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLM
KELM
VLM
105
0
0
06 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
347
0
0
06 May 2025
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
67
0
0
23 Apr 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
135
1
0
03 Feb 2025
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Christopher Simic
Korbinian Riedhammer
Tobias Bocklet
134
0
0
03 Feb 2025
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
Junan Zhang
Jing Yang
Zihao Fang
Yansen Wang
Zehua Zhang
Zhuo Wang
Fan Fan
Zhikai Wu
105
4
0
26 Jan 2025
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae
Anastasia Kuznetsova
Dinesh Manocha
John Hershey
Trausti Kristjansson
Minje Kim
91
0
0
23 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
99
1
0
23 Jan 2025
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Rui Liu
Hongyu Yuan
Hong Li
68
0
0
03 Jan 2025
Guided Speaker Embedding
Shota Horiguchi
Takafumi Moriya
Atsushi Ando
Takanori Ashihara
Hiroshi Sato
Naohiro Tawara
Marc Delcroix
71
0
0
03 Jan 2025
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
Marcelo Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
101
1
0
06 Nov 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDa
BDL
AuLLM
VLM
73
18
0
23 Oct 2024
GraFPrint: A GNN-Based Approach for Audio Identification
Aditya Bhattacharjee
Shubhr Singh
Emmanouil Benetos
39
0
0
14 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
175
2
0
09 Oct 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
52
1
0
25 Sep 2024
Learning Source Disentanglement in Neural Audio Codec
Xiaoyu Bie
Xubo Liu
Gaël Richard
48
1
0
17 Sep 2024
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Zakaria Aldeneh
Takuya Higuchi
Jee-weon Jung
Li-Wei Chen
Stephen Shum
Ahmed Hussen Abdelaziz
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
VLM
SSL
51
0
0
16 Sep 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
63
1
0
13 Sep 2024
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
61
1
0
08 Jul 2024
ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement
Chaojian Li
Wenwan Chen
Jiayi Yuan
Yingyan Lin
Ashutosh Sabharwal
50
0
0
19 Mar 2023
1