ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,036 papers shown
Title
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech
  Models via Language-Specific Experts
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts
Thomas Palmeira Ferraz
Marcely Zanon Boito
Caroline Brun
Vassilina Nikoulina
29
12
0
02 Nov 2023
Automatic Disfluency Detection from Untranscribed Speech
Automatic Disfluency Detection from Untranscribed Speech
Amrit Romana
K. Koishida
E. Provost
47
6
0
01 Nov 2023
Investigating Self-Supervised Deep Representations for EEG-based
  Auditory Attention Decoding
Investigating Self-Supervised Deep Representations for EEG-based Auditory Attention Decoding
Karan Thakkar
Jiarui Hai
Mounya Elhilali
18
1
0
01 Nov 2023
Pre-trained Speech Processing Models Contain Human-Like Biases that
  Propagate to Speech Emotion Recognition
Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition
Isaac Slaughter
Craig Greenberg
Reva Schwartz
Aylin Caliskan
35
4
0
29 Oct 2023
TorchAudio 2.1: Advancing speech recognition, self-supervised learning,
  and audio processing components for PyTorch
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
...
Yumeng Tao
Robin Scheibler
Samuele Cornell
Sean Kim
Stavros Petridis
46
22
0
27 Oct 2023
CL-MASR: A Continual Learning Benchmark for Multilingual ASR
CL-MASR: A Continual Learning Benchmark for Multilingual ASR
Luca Della Libera
Pooneh Mousavi
Salah Zaiem
Cem Subakan
Mirco Ravanelli
AuLLM
CLL
48
13
0
25 Oct 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
27
31
0
25 Oct 2023
Towards Streaming Speech-to-Avatar Synthesis
Towards Streaming Speech-to-Avatar Synthesis
Tejas S. Prabhune
Peter Wu
Bohan Yu
Gopala K. Anumanchipalli
11
1
0
25 Oct 2023
Acoustic BPE for Speech Generation with Discrete Tokens
Acoustic BPE for Speech Generation with Discrete Tokens
Feiyu Shen
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
20
9
0
23 Oct 2023
Diffusion-Based Adversarial Purification for Speaker Verification
Diffusion-Based Adversarial Purification for Speaker Verification
Yibo Bai
Ju Liu
Xuelong Li
DiffM
36
2
0
22 Oct 2023
Automatic Pronunciation Assessment -- A Review
Automatic Pronunciation Assessment -- A Review
Yassine El Kheir
Ahmed M. Ali
Shammur A. Chowdhury
32
6
0
21 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
39
206
0
20 Oct 2023
BUT CHiME-7 system description
BUT CHiME-7 system description
M. Karafiát
Karel Veselý
Igor Szöke
Ladislav Mošner
Karel Beneš
Marcin Witkowski
Germán Barchi
L. Pepino
35
1
0
18 Oct 2023
CorrTalk: Correlation Between Hierarchical Speech and Facial Activity
  Variances for 3D Animation
CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation
Zhaojie Chu
K. Guo
Xiaofen Xing
Yilin Lan
Bolun Cai
Xiangmin Xu
43
5
0
17 Oct 2023
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing
  Voice Conversion
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion
Xueyao Zhang
Yicheng Gu
Haopeng Chen
Zihao Fang
Lexiao Zou
Junan Zhang
Liumeng Xue
Jinchao Zhang
Jie Zhou
Zhizheng Wu
DiffM
35
1
0
17 Oct 2023
Spatial HuBERT: Self-supervised Spatial Speech Representation Learning
  for a Single Talker from Multi-channel Audio
Spatial HuBERT: Self-supervised Spatial Speech Representation Learning for a Single Talker from Multi-channel Audio
Antoni Dimitriadis
Siqi Pan
V. Sethu
Beena Ahmed
SSL
28
3
0
17 Oct 2023
Optimized Tokenization for Transcribed Error Correction
Optimized Tokenization for Transcribed Error Correction
Tomer Wullach
Shlomo E. Chazan
32
0
0
16 Oct 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
39
8
0
16 Oct 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
28
20
0
12 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
31
3
0
12 Oct 2023
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and
  Textually Described Voices
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Matthew Baas
Herman Kamper
23
3
0
12 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech
  generation
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
40
16
0
11 Oct 2023
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Jiatong Shi
William Chen
Dan Berrebbi
Hsiu-Hsuan Wang
Wei-Ping Huang
...
Yuxun Tang
Shang-Wen Li
Abdelrahman Mohamed
Hung-yi Lee
Shinji Watanabe
LRM
ELM
42
15
0
09 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech
  segmentation into words
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
Robin Algayres
Pablo Diego-Simon
Benoît Sagot
Emmanuel Dupoux
44
1
0
08 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech
  and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge
  2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
21
7
0
08 Oct 2023
SALT: Distinguishable Speaker Anonymization Through Latent Space
  Transformation
SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation
Yuanjun Lv
Jixun Yao
Peikun Chen
Hongbin Zhou
Heng Lu
Lei Xie
30
4
0
08 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
42
81
0
07 Oct 2023
Transferring speech-generic and depression-specific knowledge for
  Alzheimer's disease detection
Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection
Ziyun Cui
Wen Wu
Wei-Qiang Zhang
Ji Wu
Chao Zhang
28
2
0
06 Oct 2023
HuBERTopic: Enhancing Semantic Representation of HuBERT through
  Self-supervision Utilizing Topic Model
HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Takashi Maekaku
Jiatong Shi
Xuankai Chang
Yuya Fujita
Shinji Watanabe
37
1
0
06 Oct 2023
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low
  Resource and Multilingual Scenarios
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
Tejes Srivastava
Jiatong Shi
William Chen
Shinji Watanabe
32
1
0
05 Oct 2023
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech
  Model
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model
Kai-Wei Chang
Ming-Hsin Chen
Yun-Ping Lin
Jing Neng Hsu
Paul Kuo-Ming Huang
Chien-yu Huang
Shang-Wen Li
Hung-yi Lee
23
6
0
04 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
48
24
0
04 Oct 2023
Audio-visual child-adult speaker classification in dyadic interactions
Audio-visual child-adult speaker classification in dyadic interactions
Anfeng Xu
Kevin Huang
Tiantian Feng
Helen Tager-Flusberg
Shrikanth Narayanan
20
3
0
03 Oct 2023
One model to rule them all ? Towards End-to-End Joint Speaker
  Diarization and Speech Recognition
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition
Samuele Cornell
Jee-weon Jung
Shinji Watanabe
S. Squartini
VLM
32
16
0
02 Oct 2023
It HAS to be Subjective: Human Annotator Simulation via Zero-shot
  Density Estimation
It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation
Wen Wu
Wenlin Chen
C. Zhang
P. Woodland
21
1
0
30 Sep 2023
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and
  General Domain ASR
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Tobi Olatunji
Tejumade Afonja
Aditya Yadavalli
Chris C. Emezue
Sahib Singh
...
Joanne I. Osuchukwu
Salomey Osei
A. Tonja
Naome A. Etori
Clinton Mbataku
25
16
0
30 Sep 2023
Improving Audio Captioning Models with Fine-grained Audio Features, Text
  Embedding Supervision, and LLM Mix-up Augmentation
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Shih-Lun Wu
Xuankai Chang
Gordon Wichern
Jee-weon Jung
Franccois G. Germain
Jonathan Le Roux
Shinji Watanabe
18
16
0
29 Sep 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Xin Wang
Taein Kwon
Wei-Ning Hsu
Yossi Adi
Tu Nguyen
D. Bohus
Emmanuel Dupoux
Neel Joshi
Abdelrahman Mohamed
12
4
0
29 Sep 2023
Meeting Recognition with Continuous Speech Separation and
  Transcription-Supported Diarization
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
Thilo von Neumann
Christoph Boeddeker
Tobias Cord-Landwehr
Marc Delcroix
Reinhold Haeb-Umbach
23
7
0
28 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard
  Parameter Sharing
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
28
2
0
27 Sep 2023
Exploring Speech Recognition, Translation, and Understanding with
  Discrete Speech Units: A Comparative Study
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Xuankai Chang
Brian Yan
Kwanghee Choi
Jee-weon Jung
Yichen Lu
...
Pengcheng Guo
Yao-Fei Cheng
Pavel Denisov
Kohei Saijo
Hsiu-Hsuan Wang
31
36
0
27 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with
  Large Language Models
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
Eng Siong Chng
32
42
0
27 Sep 2023
Joint Prediction and Denoising for Large-scale Multilingual
  Self-supervised Learning
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
William Chen
Jiatong Shi
Brian Yan
Dan Berrebbi
Wangyou Zhang
Yifan Peng
Xuankai Chang
Soumi Maiti
Shinji Watanabe
32
8
0
26 Sep 2023
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for
  Automatic Speaker Verification
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
Duc-Tuan Truong
Ruijie Tao
J. Yip
Kong Aik Lee
Chng Eng Siong
32
6
0
26 Sep 2023
Online Active Learning For Sound Event Detection
Online Active Learning For Sound Event Detection
Mark Lindsey
Ankit Shah
Francis Kubala
R. M. Stern
26
0
0
25 Sep 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
28
69
0
25 Sep 2023
Unsupervised Accent Adaptation Through Masked Language Model Correction
  Of Discrete Self-Supervised Speech Units
Unsupervised Accent Adaptation Through Masked Language Model Correction Of Discrete Self-Supervised Speech Units
Jakob Poncelet
Hugo Van hamme
23
3
0
25 Sep 2023
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech
  Data
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Jianwei Yu
Hangting Chen
Yanyao Bian
Xiang Li
Yimin Luo
Jinchuan Tian
Mengyang Liu
Jiayi Jiang
Shuai Wang
VLM
18
12
0
25 Sep 2023
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech
  Representation Learning
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Guan-lin Yang
Ziyang Ma
Zhisheng Zheng
Ya-Zhen Song
Zhikang Niu
Xie Chen
38
8
0
25 Sep 2023
Human Transcription Quality Improvement
Human Transcription Quality Improvement
Jian Gao
Hanbo Sun
Cheng Cao
Zheng Du
43
2
0
24 Sep 2023
Previous
123...121314...192021
Next