ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,022 papers shown
Title
Self-supervised language learning from raw audio: Lessons from the Zero
  Resource Speech Challenge
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
32
30
0
27 Oct 2022
Token-level Sequence Labeling for Spoken Language Understanding using
  Compositional End-to-End Models
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Siddhant Arora
Siddharth Dalmia
Brian Yan
Florian Metze
A. Black
Shinji Watanabe
15
12
0
27 Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for
  Automatic Speech Recognition
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Yujin Wang
Changli Tang
Ziyang Ma
Zhisheng Zheng
Xie Chen
Weiqiang Zhang
40
1
0
27 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
61
19
0
27 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Jingyi Li
Weiping Tu
Li Xiao
46
96
0
27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by
  Combining Regression and Improved Contrastive Learning
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-shi Zhu
Long Zhou
Jie Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLM
SSL
60
37
0
27 Oct 2022
UFO2: A unified pre-training framework for online and offline speech
  recognition
UFO2: A unified pre-training framework for online and offline speech recognition
Li Fu
Siqi Li
Qingtao Li
L. Deng
Fangzhu Li
Lu Fan
Meng Chen
Xiaodong He
OffRL
24
8
0
26 Oct 2022
AVES: Animal Vocalization Encoder based on Self-Supervision
AVES: Animal Vocalization Encoder based on Self-Supervision
Masato Hagiwara
CLIP
VLM
AI4TS
19
24
0
26 Oct 2022
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Quchen Fu
Szu-Wei Fu
Yaran Fan
Yu-Huan Wu
Zhuo Chen
J. Gupchup
Ross Cutler
31
0
0
24 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken
  sentence embeddings
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
32
2
0
23 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of
  Speech
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
Cheol Jun Cho
Peter Wu
Abdel-rahman Mohamed
Gopala K. Anumanchipalli
29
29
0
21 Oct 2022
Large-scale learning of generalised representations for speaker
  recognition
Large-scale learning of generalised representations for speaker recognition
Jee-weon Jung
Hee-Soo Heo
Bong-Jin Lee
Jaesong Lee
Hye-jin Shim
Youngki Kwon
Joon Son Chung
Shinji Watanabe
CVBM
23
6
0
20 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation,
  Beamforming, and Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
17
19
0
19 Oct 2022
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label
  Distribution Learning
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning
Zuheng Kang
Jianzong Wang
Junqing Peng
Jing Xiao
19
3
0
18 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
28
33
0
16 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
38
7
0
16 Oct 2022
Extracting speaker and emotion information from self-supervised speech
  models via channel-wise correlations
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Themos Stafylakis
Ladislav Mošner
Sofoklis Kakouros
Oldrich Plchot
L. Burget
J. Černocký
SSL
32
8
0
15 Oct 2022
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous
  Speech Recognition
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition
Jakob Poncelet
Hugo Van hamme
18
2
0
14 Oct 2022
TransFusion: Transcribing Speech with Multinomial Diffusion
TransFusion: Transcribing Speech with Multinomial Diffusion
Matthew Baas
Kevin Eloff
Herman Kamper
DiffM
14
4
0
14 Oct 2022
Training speech emotion classifier without categorical annotations
Training speech emotion classifier without categorical annotations
Meysam Shamsi
Marie Tahon
18
2
0
14 Oct 2022
Experiments on Turkish ASR with Self-Supervised Speech Representation
  Learning
Experiments on Turkish ASR with Self-Supervised Speech Representation Learning
Ali Safaya
E. Erzin
16
1
0
13 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
On Compressing Sequences for Self-Supervised Speech Models
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
13
14
0
13 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
On the Utility of Self-supervised Models for Prosody-related Tasks
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Nigel G. Ward
23
47
0
13 Oct 2022
On the Use of Semantically-Aligned Speech Representations for Spoken
  Language Understanding
On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding
G. Laperriere
Valentin Pelloin
Mickael Rouvier
Themos Stafylakis
Yannick Esteve
27
9
0
11 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code
  Representation Learning
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
44
6
0
08 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
  Based Speech-Text Pre-training
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
61
57
0
07 Oct 2022
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised
  learning of speech representations
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
43
18
0
05 Oct 2022
Improving Label-Deficient Keyword Spotting Through Self-Supervised
  Pretraining
Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining
H. S. Bovbjerg
Zheng-Hua Tan
VLM
27
3
0
04 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language
  Model
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLM
CLIP
46
32
0
03 Oct 2022
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
  Learning in Speech and Audio
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Pedro Porto Buarque de Gusmão
Nicholas D. Lane
33
9
0
30 Sep 2022
Augmentation Invariant Discrete Representation for Generative Spoken
  Language Modeling
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling
Itai Gat
Felix Kreuk
Tu Nguyen
Ann Lee
Jade Copet
Gabriel Synnaeve
Emmanuel Dupoux
Yossi Adi
30
11
0
30 Sep 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
27
289
0
30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
35
55
0
30 Sep 2022
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector
  Quantization
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization
Xiaokang Zhao
Qiu-shi Zhu
Jie Zhang
39
4
0
28 Sep 2022
MeWEHV: Mel and Wave Embeddings for Human Voice Tasks
MeWEHV: Mel and Wave Embeddings for Human Voice Tasks
Andrés Vasco-Carofilis
Laura Fernández-Robles
Enrique Alegre
Eduardo FIDALGO
40
1
0
28 Sep 2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge
  2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022
Qutang Cai
Guoqiang Hong
Zhijian Ye
Ximin Li
Haizhou Li
33
7
0
23 Sep 2022
The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022
The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022
Gang Liu
Tianyan Zhou
Yong Zhao
Yu Wu
Zhuo Chen
Yao Qian
Jian Wu
14
1
0
22 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
  on Self-Supervised Speech Recognition models
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
24
1
0
17 Sep 2022
Overlapped speech and gender detection with WavLM pre-trained features
Overlapped speech and gender detection with WavLM pre-trained features
Martin Lebourdais
Marie Tahon
Antoine Laurent
S. Meignier
33
17
0
09 Sep 2022
Target Speaker Voice Activity Detection with Transformers and Its
  Integration with End-to-End Neural Diarization
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Dongmei Wang
Xiong Xiao
Naoyuki Kanda
Takuya Yoshioka
Jian Wu
33
25
0
27 Aug 2022
The ReprGesture entry to the GENEA Challenge 2022
The ReprGesture entry to the GENEA Challenge 2022
Sicheng Yang
Zhiyong Wu
Minglei Li
Mengchen Zhao
Jiuxin Lin
Liyang Chen
Weihong Bao
25
11
0
25 Aug 2022
3M: An Effective Multi-view, Multi-granularity, and Multi-aspect
  Modeling Approach to English Pronunciation Assessment
3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment
Fu-An Chao
Tien-Hong Lo
Tzu-I Wu
Yao-Ting Sung
Berlin Chen
26
41
0
19 Aug 2022
C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning
  for Speaker Verification
C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification
Chunlei Zhang
Dong Yu
28
17
0
15 Aug 2022
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT
K. Kinoshita
Thilo von Neumann
Marc Delcroix
Christoph Boeddeker
Reinhold Haeb-Umbach
38
4
0
28 Jul 2022
Dive into Big Model Training
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMe
AI4CE
LRM
15
3
0
25 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
27
8
0
19 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic
  Knowledge Distillation of Self-Supervised Speech Models
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
30
25
0
14 Jul 2022
Tandem Multitask Training of Speaker Diarisation and Speech Recognition
  for Meeting Transcription
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription
Xianrui Zheng
C. Zhang
P. Woodland
26
16
0
08 Jul 2022
Comparing supervised and self-supervised embedding for ExVo Multi-Task
  learning track
Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track
Tilak Purohit
Imen Ben Mahmoud
Bogdan Vlasenko
Mathew Magimai.-Doss
SSL
15
8
0
23 Jun 2022
A Systematic Comparison of Phonetic Aware Techniques for Speech
  Enhancement
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement
Or Tal
Moshe Mandel
Felix Kreuk
Yossi Adi
AAML
12
8
0
22 Jun 2022
Previous
123...18192021
Next