ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.17016
  4. Cited By
Wespeaker: A Research and Production oriented Speaker Embedding Learning
  Toolkit

Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

31 October 2022
Hongji Wang
Che-Yuan Liang
Shuai Wang
Zhengyang Chen
Binbin Zhang
Xu Xiang
Yan Deng
Y. Qian
ArXivPDFHTML

Papers citing "Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit"

50 / 71 papers shown
Title
Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
Yuxin Lin
Yinglin Zheng
Ming Zeng
Wangzheng Shi
7
0
0
19 May 2025
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
Danilo de Oliveira
Julius Richter
Tal Peer
Timo Germann
DiffM
22
0
0
16 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
50
0
0
05 May 2025
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
Beilong Tang
Bang Zeng
Ming Li
AI4TS
39
0
0
10 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
51
1
0
03 Apr 2025
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
Jakaria Islam Emon
Md Abu Salek
Kazi Tamanna Alam
52
0
0
13 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xinbing Wang
Mingqi Jiang
Z. Ma
Ziyu Zhang
Shixuan Liu
...
Zhifei Li
Xie Chen
Lei Xie
Y. Guo
Wei Xue
84
13
0
03 Mar 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
Boyi Kang
Xinfa Zhu
Zihan Zhang
Zhen Ye
Mingshuai Liu
...
Jun Chen
Longshuai Xiao
Chao Weng
Wei Xue
Lei Xie
AuLLM
55
3
0
01 Mar 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
98
0
0
21 Feb 2025
Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge
Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge
Lian Remme
Kevin Tang
75
0
0
18 Feb 2025
USED: Universal Speaker Extraction and Diarization
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
41
6
0
17 Jan 2025
Target Speaker ASR with Whisper
Target Speaker ASR with Whisper
Alexander Polok
Dominik Klement
Matthew Wiesner
Sanjeev Khudanpur
J. Černocký
L. Burget
107
1
0
17 Jan 2025
Investigation of Speaker Representation for Target-Speaker Speech
  Processing
Investigation of Speaker Representation for Target-Speaker Speech Processing
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
31
1
0
15 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
128
2
0
09 Oct 2024
Mamba-based Segmentation Model for Speaker Diarization
Mamba-based Segmentation Model for Speaker Diarization
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
Shoko Araki
Mamba
37
0
0
09 Oct 2024
Disentangling Age and Identity with a Mutual Information Minimization
  Approach for Cross-Age Speaker Verification
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Fengrun Zhang
Wangjin Zhou
Yiming Liu
Wang Geng
Yahui Shan
Chen Zhang
26
0
0
24 Sep 2024
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target
  Speaker Extraction
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Shuai Wang
Ke Zhang
Shaoxiong Lin
Junjie Li
Xuefei Wang
Meng Ge
Jianwei Yu
Yanmin Qian
Haizhou Li
42
8
0
24 Sep 2024
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
Shuai Wang
Pengcheng Zhu
Haizhou Li
28
0
0
24 Sep 2024
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for
  SSL-Based Speaker Verification
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
Junyi Peng
Ladislav Mošner
Lin Zhang
Oldrich Plchot
Themos Stafylakis
Lukáš Burget
Jan Černocký
23
0
0
23 Sep 2024
WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for
  Authenticity Verification
WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
Junzuo Zhou
Jiangyan Yi
Yong Ren
Jianhua Tao
Tao Wang
Chu Yuan Zhang
29
4
0
18 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
5
0
16 Sep 2024
On the effectiveness of enrollment speech augmentation for Target
  Speaker Extraction
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
Junjie Li
Ke Zhang
Shuai Wang
Haizhou Li
Man-Wai Mak
Kong Aik Lee
32
1
0
15 Sep 2024
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for
  Full-band Speech Restoration with Improved Intelligibility
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility
Xiaoyu Liu
Xu Li
Joan Serrà
Santiago Pascual
31
3
0
14 Sep 2024
Unified Audio Event Detection
Unified Audio Event Detection
Yidi Jiang
Ruijie Tao
Wen Huang
Qian Chen
Wen Wang
48
0
0
13 Sep 2024
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme
  representations
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
Wangjin Zhou
Fengrun Zhang
Yiming Liu
Wenhao Guan
Yi Zhao
He Qu
25
1
0
12 Sep 2024
TSELM: Target Speaker Extraction using Discrete Tokens and Language
  Models
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
Beilong Tang
Bang Zeng
Ming Li
35
2
0
12 Sep 2024
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for
  Robust Singing Voice Conversion
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Wei Chen
Xintao Zhao
Jun Chen
Binzhu Sha
Zhiwei Lin
Zhiyong Wu
44
0
0
10 Sep 2024
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow
  Matching
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Zhengyang Chen
Bing Han
Shuai Wang
Yidi Jiang
Yanmin Qian
48
0
0
07 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
42
21
0
05 Sep 2024
Progressive Residual Extraction based Pre-training for Speech
  Representation Learning
Progressive Residual Extraction based Pre-training for Speech Representation Learning
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
43
0
0
31 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
44
1
0
20 Aug 2024
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice
  Conversion
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Zhichao Wang
Yuanzhe Chen
Xinsheng Wang
Lei Xie
Yuping Wang
36
1
0
05 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
TTSDS -- Text-to-Speech Distribution Score
TTSDS -- Text-to-Speech Distribution Score
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
37
0
0
17 Jul 2024
A Benchmark for Multi-speaker Anonymization
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
46
1
0
08 Jul 2024
Systematic Evaluation of Online Speaker Diarization Systems Regarding
  their Latency
Systematic Evaluation of Online Speaker Diarization Systems Regarding their Latency
Roman Aperdannier
Sigurd Schacht
Alexander Piazza
44
0
0
05 Jul 2024
CEC: A Noisy Label Detection Method for Speaker Recognition
CEC: A Noisy Label Detection Method for Speaker Recognition
Yao Shen
Yingying Gao
Yaqian Hao
Chenguang Hu
Fulin Zhang
Junlan Feng
Shilei Zhang
NoLa
34
0
0
19 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
36
1
0
18 Jun 2024
Robust Channel Learning for Large-Scale Radio Speaker Verification
Robust Channel Learning for Large-Scale Radio Speaker Verification
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
Xugang Lu
56
2
0
16 Jun 2024
The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in
  Conversational Environments
The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments
Shareef Babu Kalluri
Prachi Singh
Pratik Roy Chowdhuri
Apoorva Kulkarni
Shikha Baghel
...
Swapnil Sontakke
D. K T
S. R. M. Prasanna
Deepu Vijayasenan
Sriram Ganapathy
37
3
0
13 Jun 2024
Exploring Spoken Language Identification Strategies for Automatic
  Transcription of Multilingual Broadcast and Institutional Speech
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
Martina Valente
Fabio Brugnara
Giovanni Morrone
Enrico Zovato
Leonardo Badino
35
0
0
13 Jun 2024
Generating Speakers by Prompting Listener Impressions for Pre-trained
  Multi-Speaker Text-to-Speech Systems
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Zhengyang Chen
Xuechen Liu
Erica Cooper
Junichi Yamagishi
Yanmin Qian
48
2
0
13 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
47
15
0
08 Jun 2024
MaskSR: Masked Language Model for Full-band Speech Restoration
MaskSR: Masked Language Model for Full-band Speech Restoration
Xu Li
Qirui Wang
Xiaoyu Liu
47
8
0
04 Jun 2024
Learning Expressive Disentangled Speech Representations with Soft Speech
  Units and Adversarial Style Augmentation
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
24
0
0
01 May 2024
Certification of Speaker Recognition Models to Additive Perturbations
Certification of Speaker Recognition Models to Additive Perturbations
Dmitrii Korzh
Elvir Karimov
Mikhail Aleksandrovich Pautov
Oleg Y. Rogov
Ivan Oseledets
50
1
0
29 Apr 2024
KunquDB: An Attempt for Speaker Verification in the Chinese Opera
  Scenario
KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Huali Zhou
Yuke Lin
Dongxi Liu
Ming Li
37
0
0
20 Mar 2024
Enhancing Audio Generation Diversity with Visual Information
Enhancing Audio Generation Diversity with Visual Information
Zeyu Xie
Baihan Li
Xuenan Xu
Mengyue Wu
Kai Yu
34
3
0
02 Mar 2024
What Do Self-Supervised Speech and Speaker Models Learn? New Findings
  From a Cross Model Layer-Wise Analysis
What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis
Takanori Ashihara
Marc Delcroix
Takafumi Moriya
Kohei Matsuura
Taichi Asami
Yusuke Ijima
SSL
24
7
0
31 Jan 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible
  recipes, self-supervised front-ends, and off-the-shelf models
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Jee-weon Jung
Wangyou Zhang
Jiatong Shi
Zakaria Aldeneh
Takuya Higuchi
B. Theobald
Ahmed Hussen Abdelaziz
Shinji Watanabe
81
21
0
30 Jan 2024
12
Next