ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

22 / 1,022 papers shown
Title
Enhancing Speech Recognition Decoding via Layer Aggregation
Enhancing Speech Recognition Decoding via Layer Aggregation
Tomer Wullach
Shlomo E. Chazan
24
1
0
21 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
35
106
0
02 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
19
11
0
01 Mar 2022
Automatic speaker verification spoofing and deepfake detection using
  wav2vec 2.0 and data augmentation
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation
Hemlata Tak
Massimiliano Todisco
Xin Wang
Jee-weon Jung
Junichi Yamagishi
Nicholas W. D. Evans
34
151
0
24 Feb 2022
Multimodal Emotion Recognition using Transfer Learning from Speaker
  Recognition and BERT-based models
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models
Sarala Padi
S. O. Sadjadi
Tianyi Zhou
Ram D. Sriram
28
36
0
16 Feb 2022
textless-lib: a Library for Textless Spoken Language Processing
textless-lib: a Library for Textless Spoken Language Processing
Eugene Kharitonov
Jade Copet
Kushal Lakhotia
Tu Nguyen
Paden Tomasello
...
A. Elkahky
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
25
32
0
15 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
35
836
0
07 Feb 2022
Self-Supervised Representation Learning for Speech Using Visual
  Grounding and Masked Language Modeling
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng
David Harwath
SSL
40
26
0
07 Feb 2022
Robust Self-Supervised Audio-Visual Speech Recognition
Robust Self-Supervised Audio-Visual Speech Recognition
Bowen Shi
Wei-Ning Hsu
Abdel-rahman Mohamed
33
90
0
05 Jan 2022
Multi-Variant Consistency based Self-supervised Learning for Robust
  Automatic Speech Recognition
Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition
Changfeng Gao
Gaofeng Cheng
Pengyuan Zhang
25
4
0
23 Dec 2021
Self-Supervised Learning for speech recognition with Intermediate layer
  supervision
Self-Supervised Learning for speech recognition with Intermediate layer supervision
Chengyi Wang
Yu-Huan Wu
Sanyuan Chen
Shujie Liu
Jinyu Li
Yao Qian
Zhenglu Yang
SSL
21
28
0
16 Dec 2021
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion
  Recognition, Speaker Verification and Spoken Language Understanding
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Yingzhi Wang
Abdelmoumene Boumadane
A. Heba
23
146
0
04 Nov 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
  Processing
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
118
193
0
14 Oct 2021
Pretext Tasks selection for multitask self-supervised speech
  representation learning
Pretext Tasks selection for multitask self-supervised speech representation learning
Salah Zaiem
Titouan Parcollet
S. Essid
Abdel Heba
SSL
14
12
0
01 Jul 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for
  Signal Recognition
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
40
9
0
05 Jun 2021
A Review of Speaker Diarization: Recent Advances with Deep Learning
A Review of Speaker Diarization: Recent Advances with Deep Learning
Tae Jin Park
Naoyuki Kanda
Dimitrios Dimitriadis
Kyu Jeong Han
Shinji Watanabe
Shrikanth Narayanan
VLM
274
327
0
24 Jan 2021
Interspeech 2021 Deep Noise Suppression Challenge
Interspeech 2021 Deep Noise Suppression Challenge
Chandan K. A. Reddy
Harishchandra Dubey
K. Koishida
A. Nair
Vishak Gopal
Ross Cutler
Sebastian Braun
H. Gamper
R. Aichner
Sriram Srinivasan
AI4CE
80
160
0
06 Jan 2021
Bayesian HMM clustering of x-vector sequences (VBx) in speaker
  diarization: theory, implementation and analysis on standard tasks
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks
Federico Landini
Jan Profant
Mireia Díez
L. Burget
216
199
0
29 Dec 2020
Don't shoot butterfly with rifles: Multi-channel Continuous Speech
  Separation with Early Exit Transformer
Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer
Sanyuan Chen
Yu-Huan Wu
Zhuo Chen
Takuya Yoshioka
Shujie Liu
Jinyu Li
29
26
0
23 Oct 2020
Multi-task self-supervised learning for Robust Speech Recognition
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
189
288
0
25 Jan 2020
End-to-End Neural Speaker Diarization with Permutation-Free Objectives
End-to-End Neural Speaker Diarization with Permutation-Free Objectives
Yusuke Fujita
Naoyuki Kanda
Shota Horiguchi
Kenji Nagamatsu
Shinji Watanabe
160
244
0
12 Sep 2019
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
236
2,233
0
14 Jun 2018
Previous
123...192021