ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,040 papers shown
Title
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer
  Learning
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning
Medha Hira
Arnav Goel
Anubha Gupta
31
1
0
23 May 2024
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech
  Foundation Models
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Chengwei Qin
Pin-Yu Chen
Chng Eng Siong
Chao Zhang
VLM
33
3
0
23 May 2024
SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic
  Injection with Large-Scale Pre-Training Diffusion Models
SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic Injection with Large-Scale Pre-Training Diffusion Models
Qingrong Cheng
Xu Li
Xinghui Fu
DiffM
38
2
0
22 May 2024
A Novel Fusion Architecture for PD Detection Using Semi-Supervised
  Speech Embeddings
A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings
Tariq Adnan
Abdelrahman Abdelkader
Zipei Liu
Ekram Hossain
Sooyong Park
Md. Saiful Islam
Ehsan Hoque
38
2
0
21 May 2024
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal
  Utterances
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
Hanlei Zhang
Hua Xu
Fei Long
Xin Wang
Kai Gao
49
3
0
21 May 2024
Mamba in Speech: Towards an Alternative to Self-Attention
Mamba in Speech: Towards an Alternative to Self-Attention
Xiangyu Zhang
Qiquan Zhang
Hexin Liu
Tianyi Xiao
Xinyuan Qian
Beena Ahmed
E. Ambikairajah
Haizhou Li
Julien Epps
Mamba
54
38
0
21 May 2024
Neighborhood Attention Transformer with Progressive Channel Fusion for
  Speaker Verification
Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification
Nian Li
Jianguo Wei
ViT
32
0
0
20 May 2024
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based
  Speech Language Model
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Siyang Wang
Éva Székely
47
4
0
16 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
41
38
0
14 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and
  Duration via Flow-based Large Diffusion Transformers
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
37
84
0
09 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
47
16
0
08 May 2024
Adapting WavLM for Speech Emotion Recognition
Adapting WavLM for Speech Emotion Recognition
Daria Diatlova
Anton Udalov
Vitalii Shutov
Egor Spirin
41
4
0
07 May 2024
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's
  Disease Detection From Spontaneous Speech
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
Zhongren Dong
Zixing Zhang
Weixiang Xu
Jing Han
Jianjun Ou
Björn W. Schuller
40
2
0
07 May 2024
MMGER: Multi-modal and Multi-granularity Generative Error Correction
  with LLM for Joint Accent and Speech Recognition
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Bingshen Mu
Yangze Li
Qijie Shao
Kun Wei
Xucheng Wan
Naijun Zheng
Huan Zhou
Lei Xie
48
6
0
06 May 2024
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale
  Pre-Trained Models
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Alessandro Pianese
D. Cozzolino
Giovanni Poggi
L. Verdoliva
43
6
0
03 May 2024
GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer
  Learning for Speech Emotion Recognition
GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
Yu Pan
Yuguang Yang
Heng Lu
Lei Ma
Jianjun Zhao
50
1
0
03 May 2024
Deep Learning Models in Speech Recognition: Measuring GPU Energy
  Consumption, Impact of Noise and Model Quantization for Edge Deployment
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment
Aditya Chakravarty
25
0
0
02 May 2024
Efficient Compression of Multitask Multilingual Speech Models
Efficient Compression of Multitask Multilingual Speech Models
Thomas Palmeira Ferraz
43
0
0
02 May 2024
Benchmarking Representations for Speech, Music, and Acoustic Events
Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra
Alkis Koudounas
Lorenzo Vaiani
Elena Baralis
Luca Cagliero
Paolo Garza
Sabato Marco Siniscalchi
43
10
0
02 May 2024
Learning Expressive Disentangled Speech Representations with Soft Speech
  Units and Adversarial Style Augmentation
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
39
0
0
01 May 2024
Self-supervised Pre-training of Text Recognizers
Self-supervised Pre-training of Text Recognizers
M. Kišš
Michal Hradiš
SSL
43
1
0
01 May 2024
Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued
  Speech Gesture Generation with Diffusion Model
Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model
Wen-Ling Lei
Li Liu
Jun Wang
DiffM
43
2
0
30 Apr 2024
TI-ASU: Toward Robust Automatic Speech Understanding through
  Text-to-speech Imputation Against Missing Speech Modality
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
Tiantian Feng
Xuan Shi
Rahul Gupta
Shrikanth S. Narayanan
49
0
0
27 Apr 2024
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Xinlei Niu
Jing Zhang
Charles Patrick Martin
34
2
0
24 Apr 2024
Rethinking Processing Distortions: Disentangling the Impact of Speech
  Enhancement Errors on Speech Recognition Performance
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Tsubasa Ochiai
Kazuma Iwamoto
Marc Delcroix
Rintaro Ikeshita
Hiroshi Sato
Shoko Araki
Shigeru Katagiri
29
2
0
23 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
38
16
0
23 Apr 2024
Retrieval-Augmented Audio Deepfake Detection
Retrieval-Augmented Audio Deepfake Detection
Zuheng Kang
Yayun He
Botao Zhao
Xiaoyang Qu
Junqing Peng
Jing Xiao
Jianzong Wang
35
7
0
22 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
56
1
0
16 Apr 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
40
20
0
15 Apr 2024
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial
  Expression Recognition in-the-wild
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
K. Chumachenko
Alexandros Iosifidis
Moncef Gabbouj
29
6
0
13 Apr 2024
Voice Attribute Editing with Text Prompt
Voice Attribute Editing with Text Prompt
Zheng-Yan Sheng
Yang Ai
Li-Juan Liu
Jia Pan
Zhenhua Ling
28
6
0
13 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
34
1
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
42
4
0
10 Apr 2024
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
42
10
0
09 Apr 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing
  Using Discrete Speech Unit Challenge
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
...
Hanzheng Li
Shuai Fan
Hui Zhang
Xie Chen
Kai Yu
43
1
0
09 Apr 2024
Test-Time Training for Depression Detection
Test-Time Training for Depression Detection
Sri Harsha Dumpala
Chandramouli Shama Sastry
Rudolf Uher
Sageev Oore
53
0
0
07 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
  for Text-to-Speech Synthesis
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
37
23
0
04 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
74
41
0
03 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
41
23
0
03 Apr 2024
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause
  Pair Extraction as Sequence Labelling Task
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task
Suyash Vardhan Mathur
Akshett Rai Jindal
Hardik Mittal
Manish Shrivastava
35
1
0
02 Apr 2024
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He
Qiaochu Huang
Zhensong Zhang
Zhiwei Lin
Zhiyong Wu
Sicheng Yang
Minglei Li
Zhiyi Chen
Songcen Xu
Xiaofei Wu
35
15
0
02 Apr 2024
Transfer Learning from Whisper for Microscopic Intelligibility
  Prediction
Transfer Learning from Whisper for Microscopic Intelligibility Prediction
Paul Best
Santiago Cuervo
R. Marxer
41
2
0
02 Apr 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
23
46
0
31 Mar 2024
A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping
  Attacks
A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping Attacks
Orson Mengara
AAML
38
4
0
29 Mar 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
74
62
0
25 Mar 2024
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover
  Strategy
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
Wenxuan Wu
Xueyuan Chen
Xixin Wu
Haizhou Li
Helen M. Meng
34
1
0
24 Mar 2024
Wav2Gloss: Generating Interlinear Glossed Text from Speech
Wav2Gloss: Generating Interlinear Glossed Text from Speech
Taiqi He
Kwanghee Choi
Lindia Tjuatja
Nathaniel R. Robinson
Jiatong Shi
Shinji Watanabe
Graham Neubig
David R. Mortensen
Lori S. Levin
VLM
30
2
0
19 Mar 2024
MSLM-S2ST: A Multitask Speech Language Model for Textless
  Speech-to-Speech Translation with Speaker Style Preservation
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
36
4
0
19 Mar 2024
An Empirical Study of Speech Language Models for Prompt-Conditioned
  Speech Synthesis
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
38
1
0
19 Mar 2024
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent
  Recognition and Out-of-scope Detection in Conversations
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
Hanlei Zhang
Xin Wang
Hua Xu
Qianrui Zhou
Kai Gao
Jianhua Su
jinyue Zhao
Wenrui Li
Yanting Chen
45
2
0
16 Mar 2024
Previous
123...91011...192021
Next