ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,040 papers shown
Title
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech
  Synthesis
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Yinlin Guo
Yening Lv
Jinqiao Dou
Yan Zhang
Yuehai Wang
42
0
0
30 Jun 2024
Factor-Conditioned Speaking-Style Captioning
Factor-Conditioned Speaking-Style Captioning
Atsushi Ando
Takafumi Moriya
Shota Horiguchi
Ryo Masumura
43
0
0
27 Jun 2024
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech
  Health Diagnostic Model
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Yi Zhu
Tiago H. Falk
MedIm
46
1
0
26 Jun 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
35
52
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
58
11
0
25 Jun 2024
Towards Probing Speech-Specific Risks in Large Multimodal Models: A
  Taxonomy, Benchmark, and Insights
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights
Hao Yang
Lizhen Qu
Ehsan Shareghi
Gholamreza Haffari
36
0
0
25 Jun 2024
Speaker-Independent Acoustic-to-Articulatory Inversion through
  Multi-Channel Attention Discriminator
Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator
Woo-Jin Chung
Hong-Goo Kang
31
1
0
25 Jun 2024
Self-Supervised Embeddings for Detecting Individual Symptoms of
  Depression
Self-Supervised Embeddings for Detecting Individual Symptoms of Depression
Sri Harsha Dumpala
Katerina Dikaios
Abraham Nunes
Frank Rudzicz
Rudolf Uher
Sageev Oore
SSL
49
1
0
25 Jun 2024
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech
  Translation System for IWSLT 2024
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024
Sai Koneru
Thai-Binh Nguyen
Ngoc-Quan Pham
Danni Liu
Zhaolin Li
Alexander Waibel
Jan Niehues
OffRL
46
3
0
24 Jun 2024
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
Tung-Yu Wu
Yu-Xiang Lin
Tsui-Wei Weng
59
1
0
24 Jun 2024
Speech Analysis of Language Varieties in Italy
Speech Analysis of Language Varieties in Italy
Moreno La Quatra
Alkis Koudounas
Elena Baralis
Sabato Marco Siniscalchi
32
3
0
22 Jun 2024
Multimodal Segmentation for Vocal Tract Modeling
Multimodal Segmentation for Vocal Tract Modeling
Rishi Jain
Bohan Yu
Peter Wu
Tejas S. Prabhune
Gopala Anumanchipalli
45
1
0
22 Jun 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient
  Zero-Shot Text to Speech Synthesizers
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Guanrou Yang
Xie Chen
AuLLM
46
3
0
22 Jun 2024
Speech Emotion Recognition under Resource Constraints with Data
  Distillation
Speech Emotion Recognition under Resource Constraints with Data Distillation
Yi Chang
Zhao Ren
Zhonghao Zhao
Thanh Tam Nguyen
Kun Qian
Tanja Schultz
Björn W. Schuller
33
0
0
21 Jun 2024
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
77
0
0
21 Jun 2024
Voice Disorder Analysis: a Transformer-based Approach
Voice Disorder Analysis: a Transformer-based Approach
Alkis Koudounas
Gabriele Ciravegna
M. Fantini
G. Succo
Erika Crosetti
Tania Cerquitelli
Elena Baralis
35
4
0
20 Jun 2024
DASB -- Discrete Audio and Speech Benchmark
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
37
15
0
20 Jun 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in
  Self-Supervised Models
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
Jing Xu
Minglin Wu
Xixin Wu
Helen Meng
CLL
49
1
0
20 Jun 2024
Children's Speech Recognition through Discrete Token Enhancement
Children's Speech Recognition through Discrete Token Enhancement
Vrunda N. Sukhadia
Shammur A. Chowdhury
53
1
0
19 Jun 2024
Explainable by-design Audio Segmentation through Non-Negative Matrix
  Factorization and Probing
Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing
Martin Lebourdais
Théo Mariotte
Antonio Almudévar
Marie Tahon
Alfonso Ortega
37
0
0
19 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
41
3
0
18 Jun 2024
Performant ASR Models for Medical Entities in Accented Speech
Performant ASR Models for Medical Entities in Accented Speech
Tejumade Afonja
Tobi Olatunji
Sewade Ogun
Naome A. Etori
A. Owodunni
Moshood Yekini
29
4
0
18 Jun 2024
Interface Design for Self-Supervised Speech Models
Interface Design for Self-Supervised Speech Models
Yi-Jen Shih
David Harwath
61
1
0
18 Jun 2024
A dual task learning approach to fine-tune a multilingual semantic
  speech encoder for Spoken Language Understanding
A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding
G. Laperriere
Sahar Ghannay
Bassam Jabaian
Yannick Esteve
35
0
0
17 Jun 2024
AnoPatch: Towards Better Consistency in Machine Anomalous Sound
  Detection
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Anbai Jiang
Bing Han
Zhiqiang Lv
Yufeng Deng
Wei-Qiang Zhang
Xie Chen
Yanmin Qian
Jia Liu
Pingyi Fan
45
3
0
17 Jun 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
43
7
0
16 Jun 2024
Robust Channel Learning for Large-Scale Radio Speaker Verification
Robust Channel Learning for Large-Scale Radio Speaker Verification
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
Xugang Lu
56
2
0
16 Jun 2024
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS
  Prediction
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
Yuxun Tang
Jiatong Shi
Yuning Wu
Qin Jin
45
9
0
16 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
47
10
0
15 Jun 2024
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech
  Emotion Recognition Challenge
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge
Federico Costa
Miquel India
Javier Hernando
41
2
0
15 Jun 2024
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation
  for Low Resource ASR
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR
Natarajan Balaji Shankar
Ruchao Fan
Abeer Alwan
39
0
0
15 Jun 2024
Benchmarking Children's ASR with Supervised and Self-supervised Speech
  Foundation Models
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
Ruchao Fan
Natarajan Balaji Shankar
Abeer Alwan
45
7
0
15 Jun 2024
Enhancing Multilingual Voice Toxicity Detection with Speech-Text
  Alignment
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
Joseph Liu
Mahesh Kumar Nandwana
Janne Pylkkönen
Hannes Heikinheimo
Morgan McGuire
39
1
0
14 Jun 2024
One-pass Multiple Conformer and Foundation Speech Systems Compression
  and Quantization Using An All-in-one Neural Model
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Zhaoqing Li
Haoning Xu
Tianzi Wang
Shoukang Hu
Zengrui Jin
Shujie Hu
Jiajun Deng
Mingyu Cui
Mengzhe Geng
Xunying Liu
MQ
44
1
0
14 Jun 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech
  Separation and Recognition
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Guinan Li
Jiajun Deng
Youjun Chen
Mengzhe Geng
Shujie Hu
...
Zengrui Jin
Tianzi Wang
Xurong Xie
Helen Meng
Xunying Liu
VLM
36
0
0
14 Jun 2024
On the Evaluation of Speech Foundation Models for Spoken Language
  Understanding
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
Siddhant Arora
Ankita Pasad
Chung-Ming Chien
Jionghao Han
Roshan S. Sharma
...
William Chen
Suwon Shon
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
56
4
0
14 Jun 2024
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation
  Detection
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
Haoyu Wang
Guoqiang Hu
Guodong Lin
Wei-Qiang Zhang
Jian Li
35
1
0
14 Jun 2024
Towards Effective and Efficient Non-autoregressive Decoding Using
  Block-based Attention Mask
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Tianzi Wang
Xurong Xie
Zhaoqing Li
Shoukang Hu
Zengrui Jin
...
Shujie Hu
Mengzhe Geng
Guinan Li
Helen Meng
Xunying Liu
34
0
0
14 Jun 2024
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech
  Representation from Self-supervised Learning Model
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Jiatong Shi
Xutai Ma
Hirofumi Inaguma
Anna Y. Sun
Shinji Watanabe
60
7
0
14 Jun 2024
On the Encoding of Gender in Transformer-based ASR Representations
On the Encoding of Gender in Transformer-based ASR Representations
Aravind Krishnan
Badr M. Abdullah
Dietrich Klakow
49
2
0
14 Jun 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with
  Progressive Constraints in a Dual-mode Training Strategy
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
Linhan Ma
Xinfa Zhu
Yuanjun Lv
Zhichao Wang
Ziqian Wang
Wendi He
Hongbin Zhou
Lei Xie
47
2
0
14 Jun 2024
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech
  Units for Spoken Language Understanding
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon
Kwangyoun Kim
Yi-Te Hsu
Prashant Sridhar
Shinji Watanabe
Karen Livescu
AuLLM
51
3
0
13 Jun 2024
Orthogonality and isotropy of speaker and phonetic information in
  self-supervised speech representations
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mukhtar Mohamed
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
51
2
0
13 Jun 2024
LASER: Learning by Aligning Self-supervised Representations of Speech
  for Improving Content-related Tasks
LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks
Amit Meghanani
Thomas Hain
46
1
0
13 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech
  Synthesis
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao
Daxin Tan
Y. Yeung
Xiao Chen
Tan Lee
35
3
0
13 Jun 2024
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging
  Co-Attention Cues in Multitask Learning
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
Arnav Goel
Medha Hira
Anubha Gupta
26
0
0
13 Jun 2024
SingOMD: Singing Oriented Multi-resolution Discrete Representation
  Construction from Speech Models
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
Yuxun Tang
Yuning Wu
Jiatong Shi
Qin Jin
62
5
0
13 Jun 2024
VISinger2+: End-to-End Singing Voice Synthesis Augmented by
  Self-Supervised Learning Representation
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Yifeng Yu
Jiatong Shi
Yuning Wu
Shinji Watanabe
40
3
0
13 Jun 2024
Self-Supervised Speech Representations are More Phonetic than Semantic
Self-Supervised Speech Representations are More Phonetic than Semantic
Kwanghee Choi
Ankita Pasad
Tomohiko Nakamura
Satoru Fukayama
Karen Livescu
Shinji Watanabe
44
14
0
12 Jun 2024
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with
  Representations from Speech Foundation Models
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Chun Yin
Tai-Shih Chi
Yu Tsao
Hsin-Min Wang
42
0
0
12 Jun 2024
Previous
123...789...192021
Next