Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.13900
Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"
50 / 1,022 papers shown
Title
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition
Ruoyu Zhao
Xiantao Jiang
Fei Yu
Victor C.M. Leung
Tao Wang
S. Zhang
30
0
0
06 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
84
3
0
03 Jan 2025
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
Zixiang Wan
Ziyue Qiu
Yiyang Liu
Wei-Qiang Zhang
26
0
0
31 Dec 2024
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
44
5
0
26 Dec 2024
Temporal-Frequency State Space Duality: An Efficient Paradigm for Speech Emotion Recognition
Jiaqi Zhao
Fei Wang
Kun Li
Yanyan Wei
Shengeng Tang
Shu Zhao
Xiao Sun
Mamba
107
2
0
22 Dec 2024
Autoregressive Speech Synthesis with Next-Distribution Prediction
Xinfa Zhu
WenJie Tian
Lei Xie
VLM
167
4
0
22 Dec 2024
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
Sangmin Lee
Woo-Jin Chung Hong-Goo Kang
Hong-Goo Kang
75
0
0
19 Dec 2024
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
Zhoulin Ji
Chenhao Lin
Hang Wang
Chao Shen
102
0
0
12 Dec 2024
Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection
Rongfeng Su
Changqing Xu
Xinyi Wu
Feng Xu
Xie Chen
Lan Wangt
Nan Yan
29
0
0
09 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
61
0
0
07 Dec 2024
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing
Yen-Ju Lu
Jing Liu
Thomas Thebaud
Laureano Moro Velázquez
Ariya Rastrow
Najim Dehak
Jesus Villalba
74
1
0
05 Dec 2024
FreeCodec: A disentangled neural speech codec with fewer tokens
Youqiang Zheng
Weiping Tu
Yueteng Kang
Jie Chen
Yike Zhang
Li Xiao
Yuhong Yang
Long Ma
75
1
0
02 Dec 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
77
9
0
29 Nov 2024
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Shih-Heng Wang
Zih-Ching Chen
Jiatong Shi
Ming To Chuang
Guan-Ting Lin
Kuan Po Huang
David F. Harwath
Shang-Wen Li
Hung-yi Lee
78
1
0
27 Nov 2024
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Shih-Heng Wang
Jiatong Shi
Chien-yu Huang
Shinji Watanabe
Hung-yi Lee
69
0
0
27 Nov 2024
Multi-Resolution Generative Modeling of Human Motion from Limited Data
David Eduardo Moreno-Villamarín
A. Hilsmann
Peter Eisert
DiffM
3DH
81
0
0
25 Nov 2024
SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Youngjun Sim
Jinsung Yoon
Young-Joo Suh
79
0
0
25 Nov 2024
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
Jiawei Yu
Y. Li
Xiaosong Qiao
Huan Zhao
Xiaofeng Zhao
Wei Tang
M. Zhang
Hao Yang
Jinsong Su
80
0
0
20 Nov 2024
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
Jingyu Li
Aemon Yat Fei Chiu
Tan Lee
59
0
0
18 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
29
0
0
14 Nov 2024
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
Eleonora Mancini
Francesco Paissan
Paolo Torroni
Mirco Ravanelli
Cem Subakan
46
0
0
12 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
43
0
0
11 Nov 2024
CTC-Assisted LLM-Based Contextual ASR
Guanrou Yang
Z. Ma
Zhifu Gao
Shiliang Zhang
Xie Chen
26
2
0
10 Nov 2024
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
M. Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
53
1
0
06 Nov 2024
MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models
Wen-Chin Huang
Erica Cooper
T. Toda
40
4
0
06 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
X. Li
Shuai Wang
H. Li
34
4
0
05 Nov 2024
Real-Time Scream Detection and Position Estimation for Worker Safety in Construction Sites
Bikalpa Gautam
Anmol Guragain
Sarthak Giri
29
0
0
05 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
39
4
0
04 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
28
0
0
31 Oct 2024
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions
Theo Clark
Benedetta Cevoli
Eloy de Jong
Timofey Abramski
Jamie Dougherty
SSL
36
0
0
31 Oct 2024
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
Axel Roebel
37
1
0
30 Oct 2024
Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features
Abdelrahman Abdelwahab
Abdelrahman Abdelwahab
Ayaan Vaswani
Advait Bharathulwar
Arnav Kommaraju
24
1
0
26 Oct 2024
Personality Analysis from Online Short Video Platforms with Multi-domain Adaptation
Sixu An
X. Sun
Yicong Li
Yu Yang
Guandong Xu
31
0
0
26 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
34
0
0
24 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
54
2
0
23 Oct 2024
Characterizing Robocalls with Multiple Vantage Points
Sathvik Prasad
Aleksandr Nahapetyan
Bradley Reaves
24
0
0
22 Oct 2024
Continuous Speech Tokenizer in Text To Speech
Yixing Li
Ruobing Xie
X. Sun
Yu Cheng
Zhanhui Kang
AuLLM
CLL
55
2
0
22 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
31
1
0
21 Oct 2024
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
Suhita Ghosh
Melanie Jouaiti
Arnab Das
Yamini Sinha
Tim Polzehl
Ingo Siegert
Sebastian Stober
23
2
0
20 Oct 2024
Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
Suhita Ghosh
Tim Thiele
Frederic Lorbeer
Frank Dreyer
Sebastian Stober
30
0
0
20 Oct 2024
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Yuzhe Weng
Haotian Wang
Tian Gao
Kewei Li
Shutong Niu
Jun Du
33
0
0
19 Oct 2024
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
T. Nguyen
Seymanur Akti
Ngoc-Quan Pham
A. Waibel
28
0
0
19 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
Carlos Carvalho
A. Abad
21
0
0
18 Oct 2024
Optimal Transport Maps are Good Voice Converters
Arip Asadulaev
Rostislav Korst
V. Shutov
Alexander Korotin
Yaroslav Grebnyak
Vahe Egiazarian
E. Burnaev
OT
34
1
0
17 Oct 2024
STCON System for the CHiME-8 Challenge
Anton Mitrofanov
Tatiana Prisyach
Tatiana Timofeeva
Sergei Novoselov
M. Korenevsky
...
Dmitriy Miroshnichenko
Nikita Mamaev
Ilya Odegov
Olga Rudnitskaya
A. Romanenko
26
1
0
17 Oct 2024
On the Use of Audio to Improve Dialogue Policies
Daniel Roncel
Federico Costa
Javier Hernando
28
0
0
17 Oct 2024
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
Natsuo Yamashita
Masaaki Yamamoto
Y. Kawaguchi
32
0
0
17 Oct 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Ashish Seth
Ramaneswaran Selvakumar
S. Sakshi
Sonal Kumar
Sreyan Ghosh
Dinesh Manocha
24
0
0
17 Oct 2024
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
Orchid Chetia Phukan
Devyani Koshal
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
21
0
0
16 Oct 2024
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
Sarthak Jain
Orchid Chetia Phukan
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
CLL
21
0
0
16 Oct 2024
Previous
1
2
3
4
5
6
...
19
20
21
Next