ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,040 papers shown
Title
Enhancing Dialogue Speech Recognition with Robust Contextual Awareness
  via Noise Representation Learning
Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning
Wonjun Lee
San Kim
Gary Geunbae Lee
51
0
0
12 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for
  Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
44
0
0
11 Aug 2024
Exploiting Consistency-Preserving Loss and Perceptual Contrast
  Stretching to Boost SSL-based Speech Enhancement
Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement
Muhammad Salman Khan
Moreno La Quatra
Kuo-Hsuan Hung
Szu-Wei Fu
Sabato Marco Siniscalchi
Yu Tsao
36
2
0
08 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
45
0
0
08 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
43
2
0
08 Aug 2024
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture
  Generation
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
Xiaofeng Mao
Zhengkai Jiang
Qilin Wang
Chencan Fu
Jiangning Zhang
Jiafu Wu
Yabiao Wang
Chengjie Wang
Wei Li
Mingmin Chi
85
4
0
06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
48
0
0
05 Aug 2024
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified
  Model
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
Xiangyu Fan
Jiaqi Li
Zhiqian Lin
Weiye Xiao
Lei Yang
CVBM
VGen
61
4
0
01 Aug 2024
DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer
  Normalization Mamba-2 framework
DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework
Fan Zhang
Naye Ji
Fuxing Gao
Bozuo Zhao
Jingmei Wu
...
Zhenqing Ye
Jiayang Zhu
WeiFan Zhong
Leyao Yan
Xiaomeng Ma
42
0
0
01 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
49
1
0
01 Aug 2024
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End
  Modeling with LM Knowledge Distillation
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Masato Mimura
Takatomo Kano
A. Ogawa
Marc Delcroix
29
2
0
01 Aug 2024
Enhancing Partially Spoofed Audio Localization with Boundary-aware
  Attention Mechanism
Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism
Jiafeng Zhong
Bin Li
Jiangyan Yi
42
1
0
31 Jul 2024
Confidence Estimation for Automatic Detection of Depression and
  Alzheimer's Disease Based on Clinical Interviews
Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews
Wen Wu
Chuxu Zhang
P. Woodland
43
1
0
29 Jul 2024
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and
  Disentangled Multi-Modality Fusion
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion
Chencan Fu
Yabiao Wang
Jiangning Zhang
Zhengkai Jiang
Xiaofeng Mao
Jiafu Wu
Weijian Cao
Chengjie Wang
Yanhao Ge
Yong Liu
Mamba
65
2
0
29 Jul 2024
ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech
  Enhancement
ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement
Zhong-Qiu Wang
35
1
0
28 Jul 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech
  Processing Tasks
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
50
1
0
28 Jul 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake
  Detection
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
Yi Zhu
Surya Koppisetti
Trang Tran
Gaurav Bharaj
59
9
0
26 Jul 2024
Describe Where You Are: Improving Noise-Robustness for Speech Emotion
  Recognition with Text Description of the Environment
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
Seong-Gyun Leem
Daniel Fulford
J. Onnela
David Gard
Carlos Busso
44
0
0
25 Jul 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
34
0
0
24 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant
  Automatic Speech Recognition and Diarization
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Sanjeev Khudanpur
Paola García
Shinji Watanabe
48
9
0
23 Jul 2024
Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction
Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction
Rithik Sachdev
Zhong-Qiu Wang
Chao-Han Huck Yang
36
3
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
47
4
0
21 Jul 2024
Using Speech Foundational Models in Loss Functions for Hearing Aid
  Speech Enhancement
Using Speech Foundational Models in Loss Functions for Hearing Aid Speech Enhancement
Robert Sutherland
George Close
Thomas Hain
Stefan Goetze
Jon Barker
36
1
0
18 Jul 2024
MDPE: A Multimodal Deception Dataset with Personality and Emotional
  Characteristics
MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics
Cong Cai
Shan Liang
Xuefei Liu
Kang Zhu
Zhengqi Wen
...
Zhenhua Cheng
Hanzhe Xu
Ruibo Fu
Bin Liu
Yongwei Li
37
3
0
17 Jul 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of
  Flow-Matching-Based Zero-Shot Text-to-Speech
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Haibin Wu
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Daniel Tompkins
...
Canrun Li
Zhen Xiao
Sheng Zhao
Jinyu Li
Naoyuki Kanda
28
7
0
17 Jul 2024
A Language Modeling Approach to Diacritic-Free Hebrew TTS
A Language Modeling Approach to Diacritic-Free Hebrew TTS
Amit Roth
A. Turetzky
Yossi Adi
42
2
0
16 Jul 2024
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
Junqi Zhao
Xubo Liu
Jinzheng Zhao
Yiitan Yuan
Qiuqiang Kong
Mark D. Plumbley
Wenwu Wang
36
3
0
16 Jul 2024
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
J. Hauret
Malo Olivier
Thomas Joubaud
C. Langrenne
Sarah Poirée
V. Zimpfer
Éric Bavu
93
1
0
16 Jul 2024
Qwen2-Audio Technical Report
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
39
112
0
15 Jul 2024
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Li Lyna Zhang
Ning Jiang
Qing Wang
Yuehong Li
Quan Lu
Lei Xie
36
6
0
14 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
56
33
0
11 Jul 2024
VoxMed: One-Step Respiratory Disease Classifier using Digital
  Stethoscope Sounds
VoxMed: One-Step Respiratory Disease Classifier using Digital Stethoscope Sounds
Paridhi Mundra
Manik Sharma
Yashwardhan Chaudhuri
Orchid Chetia Phukan
Arun Balaji Buduru
35
0
0
10 Jul 2024
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech
  Integrated Large Language Models
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models
Yi-Cheng Lin
T. Lin
Chih-Kai Yang
Ke-Han Lu
Wei-Chih Chen
Chun-Yi Kuan
Hung-yi Lee
36
3
0
09 Jul 2024
MSP-Podcast SER Challenge 2024: Lántenne du Ventoux Multimodal
  Self-Supervised Learning for Speech Emotion Recognition
MSP-Podcast SER Challenge 2024: Lántenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition
J. Duret
Mickael Rouvier
Yannick Esteve
35
0
0
08 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for
  Large-Scale Speech Generation
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
41
38
0
07 Jul 2024
A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion
  Recognition
A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition
Shreya G. Upadhyay
Carlos Busso
Chi-Chun Lee
55
3
0
06 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
57
20
0
05 Jul 2024
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in
  Tunisian Dialect
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
Salima Mdhaffar
Haroun Elleuch
Fethi Bougares
Yannick Esteve
64
0
0
05 Jul 2024
Who Finds This Voice Attractive? A Large-Scale Experiment Using
  In-the-Wild Data
Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data
Hitoshi Suda
Aya Watanabe
Shinnosuke Takamichi
36
0
0
05 Jul 2024
MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production
MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production
Jian Ma
Wenguan Wang
Yi Yang
Feng Zheng
50
1
0
04 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
49
2
0
04 Jul 2024
Improving Self-supervised Pre-training using Accent-Specific Codebooks
Improving Self-supervised Pre-training using Accent-Specific Codebooks
Darshan Prabhu
Abhishek Gupta
Omkar Nitsure
Preethi Jyothi
Sriram Ganapathy
SSL
52
0
0
04 Jul 2024
Continual Learning Optimizations for Auto-regressive Decoder of
  Multilingual ASR systems
Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
Chin Yuen Kwok
J. Yip
Eng Siong Chng
CLL
46
1
0
04 Jul 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with
  Discrete Speech Representations
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan
Nithin Rao Koluguri
Ante Jukić
Ryan Langman
Jagadeesh Balam
Boris Ginsburg
56
1
0
03 Jul 2024
Self-supervised ASR Models and Features For Dysarthric and Elderly
  Speech Recognition
Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
Shujie Hu
Xurong Xie
Mengzhe Geng
Zengrui Jin
Jiajun Deng
...
Yi Wang
Mingyu Cui
Tianzi Wang
Helen Meng
Xunying Liu
61
6
0
03 Jul 2024
Towards the Next Frontier in Speech Representation Learning Using
  Disentanglement
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
24
1
0
02 Jul 2024
Investigating the Effects of Large-Scale Pseudo-Stereo Data and
  Different Speech Foundation Model on Dialogue Generative Spoken Language
  Model
Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model
Yu-Kuan Fu
Cheng-Kuang Lee
Hsiu-Hsuan Wang
Hung-yi Lee
35
0
0
02 Jul 2024
ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024
ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024
Ruibo Fu
Rui Liu
Chunyu Qiang
Yingming Gao
Yi Lu
...
Chen Zhang
Hui Bu
Yukun Liu
Xin Qi
Guanjun Li
30
5
0
01 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
47
8
0
30 Jun 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Tomoya Yanagita
...
Haotian Tan
Makoto Sakai
S. Sakti
Katsuhito Sudoh
Satoshi Nakamura
33
1
0
30 Jun 2024
Previous
123...678...192021
Next