ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,022 papers shown
Title
Supervision-Guided Codebooks for Masked Prediction in Speech
  Pre-training
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Chengyi Wang
Yiming Wang
Yu Wu
Sanyuan Chen
Jinyu Li
Shujie Liu
Furu Wei
SSL
25
18
0
21 Jun 2022
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised
  Learning and Its Application to Children's ASR
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
Ruchao Fan
Abeer Alwan
24
30
0
16 Jun 2022
Transformer-based Automatic Speech Recognition of Formal and Colloquial
  Czech in MALACH Project
Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Jan Lehecka
J. Psutka
Josef Psutka
8
4
0
15 Jun 2022
Investigation of Ensemble features of Self-Supervised Pretrained Models
  for Automatic Speech Recognition
Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition
Anjana Arunkumar
Vrunda N. Sukhadia
S. Umesh
25
10
0
11 Jun 2022
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Arunkumar A
S. Umesh
SSL
34
8
0
09 Jun 2022
Words are all you need? Language as an approximation for human
  similarity judgments
Words are all you need? Language as an approximation for human similarity judgments
Raja Marjieh
Pol van Rijn
Ilia Sucholutsky
T. Sumers
Harin Lee
Thomas L. Griffiths
Nori Jacoby
29
19
0
08 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
32
99
0
02 Jun 2022
Joint Training of Speech Enhancement and Self-supervised Model for
  Noise-robust ASR
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-shi Zhu
Jie Zhang
Zitian Zhang
Lirong Dai
43
15
0
26 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
131
349
0
21 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
  Speech Representation
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Sameer Khurana
Antoine Laurent
James R. Glass
25
36
0
17 May 2022
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Bowen Shi
Abdel-rahman Mohamed
Wei-Ning Hsu
SSL
28
17
0
15 May 2022
Bias and Fairness on Multimodal Emotion Detection Algorithms
Bias and Fairness on Multimodal Emotion Detection Algorithms
Matheus Schmitz
Rehan Ahmed
Jim Cao
FaML
49
12
0
11 May 2022
Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
Jiachen Lian
Chunlei Zhang
Gopala Krishna Anumanchipalli
Dong Yu
13
23
0
11 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to
  Store Speaker Information
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
25
8
0
08 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
45
37
0
02 May 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker
  Recognition?
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Sanyuan Chen
Yu Wu
Chengyi Wang
Shujie Liu
Zhuo Chen
...
Gang Liu
Jinyu Li
Jian Wu
Xiangzhan Yu
Furu Wei
SSL
18
39
0
27 Apr 2022
WaBERT: A Low-resource End-to-end Model for Spoken Language
  Understanding and Speech-to-BERT Alignment
WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment
Lin Yao
Jianfei Song
Rui Xu
Yingfang Yang
Zijian Chen
Yafeng Deng
VLM
18
2
0
22 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio
  Representations
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
36
53
0
15 Apr 2022
HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
J. Yoon
Beom Jun Woo
N. Kim
27
13
0
13 Apr 2022
The PartialSpoof Database and Countermeasures for the Detection of Short
  Fake Speech Segments Embedded in an Utterance
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance
Lin Zhang
Xin Wang
Erica Cooper
Nicholas W. D. Evans
Junichi Yamagishi
19
56
0
11 Apr 2022
Fusion of Self-supervised Learned Models for MOS Prediction
Fusion of Self-supervised Learned Models for MOS Prediction
Zhengdong Yang
Wangjin Zhou
Chenhui Chu
Sheng Li
Raj Dabre
Raphaël Rubino
Yi Zhao
20
28
0
11 Apr 2022
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech
  Recognition
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
Qianying Liu
Zhuo Gong
Zhengdong Yang
Yuhang Yang
Sheng Li
...
N. Minematsu
Hao-Ming Huang
Fei Cheng
Chenhui Chu
Sadao Kurohashi
24
5
0
08 Apr 2022
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
Ryandhimas E. Zezario
Szu-Wei Fu
Fei Chen
C. Fuh
Hsin-Min Wang
Yu Tsao
24
13
0
07 Apr 2022
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
  Prediction Model for Hearing Aids
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids
Ryandhimas E. Zezario
Fei Chen
C. Fuh
Hsin-Min Wang
Yu Tsao
24
16
0
07 Apr 2022
Speech Pre-training with Acoustic Piece
Speech Pre-training with Acoustic Piece
Shuo Ren
Shujie Liu
Yu Wu
Long Zhou
Furu Wei
SSL
14
16
0
07 Apr 2022
Leveraging Real Conversational Data for Multi-Channel Continuous Speech
  Separation
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Xiaofei Wang
Dongmei Wang
Naoyuki Kanda
Sefik Emre Eskimez
Takuya Yoshioka
25
8
0
07 Apr 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised
  Pre-training and Data Augmentation
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri
Peng-Jen Chen
Changhan Wang
J. Pino
Yossi Adi
Jiatao Gu
Wei-Ning Hsu
Ann Lee
28
56
0
06 Apr 2022
Federated Self-supervised Speech Representations: Are We There Yet?
Federated Self-supervised Speech Representations: Are We There Yet?
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Abhinav Mehrotra
Nicholas D. Lane
35
13
0
06 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
10
26
0
05 Apr 2022
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Takaaki Saeki
Detai Xin
Wataru Nakata
Tomoki Koriyama
Shinnosuke Takamichi
Hiroshi Saruwatari
27
174
0
05 Apr 2022
Self-Supervised Speech Representations Preserve Speech Characteristics
  while Anonymizing Voices
Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices
Abner Hernandez
Paula Andrea Pérez-Toro
Juan Camilo Vásquez-Correa
J. Orozco-Arroyave
Andreas Maier
S. Yang
19
1
0
04 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and
  Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang
Takashi Maekaku
Yuya Fujita
Shinji Watanabe
VLM
51
45
0
01 Apr 2022
On the Efficiency of Integrating Self-supervised Learning and
  Meta-learning for User-defined Few-shot Keyword Spotting
On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting
Wei-Tsung Kao
Yue Wu
Chia-Ping Chen
Zhi-Sheng Chen
Yu-Pao Tsai
Hung-yi Lee
14
7
0
01 Apr 2022
WavFT: Acoustic model finetuning with labelled and unlabelled data
WavFT: Acoustic model finetuning with labelled and unlabelled data
Utkarsh Chauhan
Vikas Joshi
Rupeshkumar Mehta
16
0
0
01 Apr 2022
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
  Speech Data
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Junyi Ao
Zi-Hua Zhang
Long Zhou
Shujie Liu
Haizhou Li
Tom Ko
Lirong Dai
Jinyu Li
Yao Qian
Furu Wei
SSL
19
19
0
31 Mar 2022
CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained
  ASR Embeddings for Speech Emotion Recognition
CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Chengxin Chen
Pengyuan Zhang
AI4TS
16
10
0
31 Mar 2022
Analyzing the factors affecting usefulness of Self-Supervised
  Pre-trained Representations for Speech Recognition
Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition
Ashish Seth
L. D. Prasad
Sreyan Ghosh
S. Umesh
30
3
0
31 Mar 2022
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech
  Representations
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations
L. D. Prasad
Sreyan Ghosh
S. Umesh
25
12
0
31 Mar 2022
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An
  Extensive Benchmark on Air Traffic Control Communications
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
Seyyed Saeed Sarfjoo
P. Motlícek
Matthias Kleinert
H. Helmke
Oliver Ohneiser
Qingran Zhan
19
43
0
31 Mar 2022
SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken
  Language Model for Speech Processing Tasks
SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks
Kai-Wei Chang
Wei-Cheng Tseng
Shang-Wen Li
Hung-yi Lee
24
22
0
31 Mar 2022
Improving Mispronunciation Detection with Wav2vec2-based Momentum
  Pseudo-Labeling for Accentedness and Intelligibility Assessment
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Mu Yang
K. Hirschi
S. Looney
Okim Kang
John H. L. Hansen
37
15
0
29 Mar 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning
  with Once-for-All Hidden-Unit BERT
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Rui Wang
Qibing Bai
Junyi Ao
Long Zhou
Zhixiang Xiong
Zhihua Wei
Yu Zhang
Tom Ko
Haizhou Li
34
61
0
29 Mar 2022
ASR data augmentation in low-resource settings using cross-lingual
  multi-speaker TTS and cross-lingual voice conversion
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Edresson Casanova
C. Shulby
Alexander Korolev
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
21
11
0
29 Mar 2022
Investigating Self-supervised Pretraining Frameworks for Pathological
  Speech Recognition
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
22
31
0
29 Mar 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
29
144
0
29 Mar 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic
  Speaker Verification
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
Yang Zhang
Zhiqiang Lv
Haibin Wu
Shanshan Zhang
Pengfei Hu
Zhiyong Wu
Hung-yi Lee
Helen Meng
ViT
21
130
0
29 Mar 2022
Robust Speaker Recognition with Transformers Using wav2vec 2.0
Robust Speaker Recognition with Transformers Using wav2vec 2.0
Sergey Novoselov
G. Lavrentyeva
Anastasia Avdeeva
V. Volokhov
Aleksei Gusev
ViT
13
18
0
28 Mar 2022
Training speaker recognition systems with limited data
Training speaker recognition systems with limited data
Nik Vaessen
David A. van Leeuwen
11
6
0
28 Mar 2022
SelfRemaster: Self-Supervised Speech Restoration with
  Analysis-by-Synthesis Approach Using Channel Modeling
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
Takaaki Saeki
Shinnosuke Takamichi
Tomohiko Nakamura
Naoko Tanji
Hiroshi Saruwatari
31
6
0
24 Mar 2022
Previous
123...192021
Next