ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,036 papers shown
Title
VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make
  Keyword Spotting More Robust Against Adversarial Attacks
VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make Keyword Spotting More Robust Against Adversarial Attacks
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Tiago H. Falk
AAML
25
0
0
22 Sep 2023
Big model only for hard audios: Sample dependent Whisper model selection
  for efficient inferences
Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Hugo Malard
Salah Zaiem
Robin Algayres
37
2
0
22 Sep 2023
NTT speaker diarization system for CHiME-7: multi-domain,
  multi-microphone End-to-end and vector clustering diarization
NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Naohiro Tawara
Marc Delcroix
Atsushi Ando
A. Ogawa
40
7
0
22 Sep 2023
Audio Contrastive based Fine-tuning
Audio Contrastive based Fine-tuning
Yang Wang
Qibin Liang
Chenghao Xiao
Yizhi Li
Noura Al Moubayed
Chenghua Lin
32
0
0
21 Sep 2023
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in
  Speaker Recognition
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition
Shuai Wang
Qibing Bai
Qi Liu
Jianwei Yu
Zhengyang Chen
Bing Han
Yan-min Qian
Haizhou Li
24
1
0
21 Sep 2023
Leveraging Data Collection and Unsupervised Learning for Code-switched
  Tunisian Arabic Automatic Speech Recognition
Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition
Ahmed Amine Ben Abdallah
Ata Kabboudi
Amir Kanoun
Salah Zaiem
38
1
0
20 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
34
12
0
19 Sep 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual
  Representation Models
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
55
14
0
19 Sep 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion
  Recognition
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Ziyang Ma
Wen Wu
Zhisheng Zheng
Yiwei Guo
Qian Chen
Shiliang Zhang
Xie Chen
27
15
0
19 Sep 2023
HTEC: Human Transcription Error Correction
HTEC: Human Transcription Error Correction
Hanbo Sun
Jian Gao
Xiaomin Wu
Anjie Fang
Cheng Cao
Zheng Du
21
1
0
18 Sep 2023
A Multitask Training Approach to Enhance Whisper with Contextual Biasing
  and Open-Vocabulary Keyword Spotting
A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
Yuang Li
Min Zhang
Chang Su
Yinglu Li
Xiaosong Qiao
Mengxin Ren
Miaomiao Ma
Daimeng Wei
Shimin Tao
Hao Yang
30
5
0
18 Sep 2023
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using
  Whisper and Metadata
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
Ryandhimas E. Zezario
Fei Chen
C. Fuh
H. Wang
Yu Tsao
37
1
0
18 Sep 2023
Training dynamic models using early exits for automatic speech
  recognition on resource-constrained devices
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
George August Wright
Umberto Cappellazzo
Salah Zaiem
Desh Raj
Lucas Ondel Yang
Daniele Falavigna
Mohamed Nabih Ali
Alessio Brutti
42
2
0
18 Sep 2023
Are Soft Prompts Good Zero-shot Learners for Speech Recognition?
Are Soft Prompts Good Zero-shot Learners for Speech Recognition?
Dianwen Ng
Chong Zhang
Ruixi Zhang
Yukun Ma
Fabian Ritter Gutierrez
Trung Hieu Nguyen
Chongjia Ni
Shengkui Zhao
E. Chng
B. Ma
VLM
40
1
0
18 Sep 2023
Enhancing GAN-Based Vocoders with Contrastive Learning Under
  Data-limited Condition
Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition
Haoming Guo
Seth Z. Zhao
Jiachen Lian
Gopala Anumanchipalli
Gerald Friedland
24
2
0
16 Sep 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained
  Generative Methods for Speech Enhancement in Adverse Conditions
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
Huan Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
37
3
0
16 Sep 2023
Foundation Model Assisted Automatic Speech Emotion Recognition:
  Transcribing, Annotating, and Augmenting
Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Tiantian Feng
Shrikanth Narayanan
37
16
0
15 Sep 2023
Characterizing the temporal dynamics of universal speech representations
  for generalizable deepfake detection
Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection
Yilun Zhu
S. Powar
Tiago H. Falk
35
2
0
15 Sep 2023
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
Peter Vieting
Simon Berger
Thilo von Neumann
Christoph Boeddeker
Ralf Schluter
Reinhold Haeb-Umbach
26
0
0
15 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of
  Speech in ASR Tasks
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Sizhou Chen
Songyang Gao
Sen Fang
21
0
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
35
24
0
14 Sep 2023
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
Sicheng Yang
Zehao Wang
Zhiyong Wu
Minglei Li
Zhensong Zhang
...
Lei Hao
Songcen Xu
Xiaofei Wu
Changpeng Yang
Zonghong Dai
DiffM
49
14
0
13 Sep 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with
  Embedding Enhancer
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Zhengyang Chen
Bing Han
Shuai Wang
Yan-min Qian
28
18
0
13 Sep 2023
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for
  Self-supervised Representations of French Speech
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Titouan Parcollet
H. Nguyen
Solène Evain
Marcely Zanon Boito
Adrien Pupier
...
François Portet
Solange Rossato
F. Ringeval
D. Schwab
Laurent Besacier
40
15
0
11 Sep 2023
Towards generalisable and calibrated synthetic speech detection with
  self-supervised representations
Towards generalisable and calibrated synthetic speech detection with self-supervised representations
Octavian Pascu
Adriana Stan
Dan Oneaţă
Elisabeta Oneata
H. Cucu
SSL
33
5
0
11 Sep 2023
Hierarchical Audio-Visual Information Fusion with Multi-label Joint
  Decoding for MER 2023
Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023
Haotian Wang
Yuxuan Xi
Hang Chen
Jun Du
Yan Song
...
Pengfei Hu
Ya Jiang
Shi Cheng
Jie Zhang
Yuzhe Weng
53
4
0
11 Sep 2023
Understanding Self-Supervised Learning of Speech Representation via
  Invariance and Redundancy Reduction
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction
Yusuf Brima
U. Krumnack
Simone Pika
Gunther Heidemann
SSL
34
0
0
07 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any
  Voice Conversion using Only Speech Data
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
28
2
0
06 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLM
DiffM
47
40
0
05 Sep 2023
Bring the Noise: Introducing Noise Robustness to Pretrained Automatic
  Speech Recognition
Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition
Patrick Eickhoff
M. Möller
Theresa Pekarek-Rosin
Johannes Twiefel
Stefan Wermter
22
2
0
05 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Leveraging Label Information for Multimodal Emotion Recognition
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
29
4
0
05 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via
  Vector-Quantized Self-Supervised Speech Representation Learning
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
43
3
0
31 Aug 2023
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
  Models
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Xin Zhang
Dong Zhang
Shimin Li
Yaqian Zhou
Xipeng Qiu
36
64
0
31 Aug 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for
  Automatic Speech Recognition
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
36
2
0
28 Aug 2023
The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
Ruoyu Wang
Maokui He
Jun Du
Hengshun Zhou
Shutong Niu
...
Mengzhi Wang
Genshun Wan
Jia Pan
Jianqing Gao
Chin-Hui Lee
30
12
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised
  representations
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
48
3
0
28 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
27
11
0
28 Aug 2023
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023
Sicheng Yang
Haiwei Xue
Zhensong Zhang
Minglei Li
Zhiyong Wu
Xiaofei Wu
Songcen Xu
Zonghong Dai
DiffM
37
15
0
26 Aug 2023
Attention-Based Acoustic Feature Fusion Network for Depression Detection
Attention-Based Acoustic Feature Fusion Network for Depression Detection
Xiao Xu
Yang Wang
Xinru Wei
Fei Wang
Xizhe Zhang
30
5
0
24 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate
  Pooling for Speaker Identification
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
30
1
0
22 Aug 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering
  with Large Language Models
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
Zihan Zhao
Yiyang Jiang
Heyang Liu
Yanfeng Wang
Yu Wang
31
2
0
20 Aug 2023
The DKU-DUKEECE System for the Manipulation Region Location Task of ADD
  2023
The DKU-DUKEECE System for the Manipulation Region Location Task of ADD 2023
Zexin Cai
Weiqing Wang
Yikang Wang
Ming Li
24
6
0
20 Aug 2023
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality
  Assessment Model
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
Ryandhimas E. Zezario
B. Bai
C. Fuh
Hsin-Min Wang
Yu Tsao
16
3
0
18 Aug 2023
Integrating Emotion Recognition with Speech Recognition and Speaker
  Diarisation for Conversations
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Wen Wu
C. Zhang
P. Woodland
31
3
0
14 Aug 2023
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Siyuan Shan
Yang Li
A. Banerjee
Junier B. Oliva
26
4
0
11 Aug 2023
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Fan Zhang
Naye Ji
Fuxing Gao
Siyuan Zhao
Zhaohan Wang
Shunman Li
32
0
0
11 Aug 2023
End-to-End Evaluation for Low-Latency Simultaneous Speech Translation
End-to-End Evaluation for Low-Latency Simultaneous Speech Translation
Christian Huber
Tu Anh Dinh
Carlos Mullov
Ngoc-Quan Pham
Thai-Binh Nguyen
...
Danni Liu
Zhaolin Li
Sai Koneru
Jan Niehues
A. Waibel
36
3
0
07 Aug 2023
Elucidate Gender Fairness in Singing Voice Transcription
Elucidate Gender Fairness in Singing Voice Transcription
Xiangming Gu
Weizhen Zeng
Ye Wang
25
3
0
05 Aug 2023
Federated Representation Learning for Automatic Speech Recognition
Federated Representation Learning for Automatic Speech Recognition
Guruprasad V Ramesh
Gopinath Chennupati
Milind Rao
Anit Kumar Sahu
Ariya Rastrow
J. Droppo
26
0
0
03 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
42
10
0
03 Aug 2023
Previous
123...131415...192021
Next