ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.15188
  4. Cited By
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

21 July 2024
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
ArXivPDFHTML

Papers citing "Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning"

50 / 108 papers shown
Title
Learning Emotion-Invariant Speaker Representations for Speaker Verification
Learning Emotion-Invariant Speaker Representations for Speaker Verification
Jingguang Tian
Xinhui Hu
Xinkang Xu
107
2
0
24 May 2025
USED: Universal Speaker Extraction and Diarization
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
76
6
0
17 Jan 2025
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
72
86
0
12 Feb 2024
What Do Self-Supervised Speech and Speaker Models Learn? New Findings
  From a Cross Model Layer-Wise Analysis
What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis
Takanori Ashihara
Marc Delcroix
Takafumi Moriya
Kohei Matsuura
Taichi Asami
Yusuke Ijima
SSL
67
7
0
31 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
58
30
0
25 Jan 2024
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for
  Speaker Verification
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification
Hyunjun Heo
U.H Shin
Ran Lee
YoungJu Cheon
Hyung-Min Park
53
11
0
14 Dec 2023
PromptSpeaker: Speaker Generation Based on Text Descriptions
PromptSpeaker: Speaker Generation Based on Text Descriptions
Yongmao Zhang
Guanghou Liu
Yinjiao Lei
Yunlin Chen
Hao Yin
Lei Xie
Zhifei Li
58
11
0
08 Oct 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
262
1,827
0
28 Sep 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with
  Embedding Enhancer
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Zhengyang Chen
Bing Han
Shuai Wang
Yan-min Qian
58
18
0
13 Sep 2023
UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023
UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023
Yu Zheng
Yajun Zhang
Chuanying Niu
Yibin Zhan
Yanhua Long
Dongxing Xu
51
4
0
24 Aug 2023
Factors Affecting the Performance of Automated Speaker Verification in
  Alzheimer's Disease Clinical Trials
Factors Affecting the Performance of Automated Speaker Verification in Alzheimer's Disease Clinical Trials
Malikeh Ehghaghi
Marija Stanojevic
Ali Akram
Jekaterina Novikova
56
1
0
20 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with
  Contextual VQ-Diffusion and Vocoding
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
67
44
0
13 Jun 2023
Self-Supervised Learning with Cluster-Aware-DINO for High-Performance
  Robust Speaker Verification
Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification
Bing Han
Zhengyang Chen
Y. Qian
54
21
0
12 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,359
0
15 Mar 2023
X-SepFormer: End-to-end Speaker Extraction Network with Explicit
  Optimization on Speaker Confusion
X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion
Kai Liu
Z.C. Du
Xucheng Wan
Huan Zhou
72
22
0
09 Mar 2023
CAM++: A Fast and Efficient Network for Speaker Verification Using
  Context-Aware Masking
CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking
Haibo Wang
Siqi Zheng
Yafeng Chen
Luyao Cheng
Qian Chen
78
81
0
01 Mar 2023
Cross-modal Audio-visual Co-learning for Text-independent Speaker
  Verification
Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Meng Liu
Kong Aik Lee
Longbiao Wang
Hanyi Zhang
Chang Zeng
Jianwu Dang
40
10
0
22 Feb 2023
Improving Transformer-based Networks With Locality For Automatic Speaker
  Verification
Improving Transformer-based Networks With Locality For Automatic Speaker Verification
Mufan Sang
Yong Zhao
Gang Liu
John H. L. Hansen
Jian Wu
ViT
62
14
0
17 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
78
96
0
31 Jan 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An Overview
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
64
88
0
31 Jan 2023
Label-free Knowledge Distillation with Contrastive Loss for Light-weight
  Speaker Recognition
Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition
Zhiyuan Peng
Xuanji He
Ke Ding
Tan Lee
Guanglu Wan
33
6
0
06 Dec 2022
In search of strong embedding extractors for speaker diarisation
In search of strong embedding extractors for speaker diarisation
Jee-weon Jung
Hee-Soo Heo
Bong-Jin Lee
Jaesung Huh
A. Brown
Youngki Kwon
Shinji Watanabe
Joon Son Chung
48
16
0
26 Oct 2022
Anonymizing Speech with Generative Adversarial Networks to Preserve
  Speaker Privacy
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy
Sarina Meyer
Pascal Tilli
Pavel Denisov
Florian Lux
Julia Koch
Ngoc Thang Vu
47
32
0
13 Oct 2022
Target Speaker Voice Activity Detection with Transformers and Its
  Integration with End-to-End Neural Diarization
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Dongmei Wang
Xiong Xiao
Naoyuki Kanda
Takuya Yoshioka
Jian Wu
66
27
0
27 Aug 2022
Non-Contrastive Self-supervised Learning for Utterance-Level Information
  Extraction from Speech
Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech
Jaejin Cho
Jesús Villalba
Laureano Moro-Velazquez
Najim Dehak
SSL
67
18
0
10 Aug 2022
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label
  Correction
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction
Bing Han
Zhengyang Chen
Y. Qian
37
32
0
03 Aug 2022
Cross-Age Speaker Verification: Learning Age-Invariant Speaker
  Embeddings
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
Xiaoyi Qin
Na Li
Chao Weng
Dan Su
Ming Li
92
17
0
13 Jul 2022
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for
  On-Device Speech Recognition
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Shaojin Ding
R. Rikhye
Qiao Liang
Yanzhang He
Quan Wang
A. Narayanan
Tom O'Malley
Ian McGraw
61
27
0
08 Apr 2022
Frequency and Multi-Scale Selective Kernel Attention for Speaker
  Verification
Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification
Sung Hwan Mun
Jee-weon Jung
Min Hyun Han
N. Kim
67
21
0
03 Apr 2022
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech
  Separation for Flexible Number of Speakers
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers
Soumi Maiti
Yushi Ueda
Shinji Watanabe
Chunlei Zhang
Meng Yu
Shi-Xiong Zhang
Yong-mei Xu
79
32
0
31 Mar 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic
  Speaker Verification
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
Yang Zhang
Zhiqiang Lv
Haibin Wu
Shanshan Zhang
Pengfei Hu
Zhiyong Wu
Hung-yi Lee
Helen Meng
ViT
76
134
0
29 Mar 2022
Pushing the limits of raw waveform speaker recognition
Pushing the limits of raw waveform speaker recognition
Jee-weon Jung
You Jin Kim
Hee-Soo Heo
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
57
90
0
16 Mar 2022
Differentially Private Speaker Anonymization
Differentially Private Speaker Anonymization
Ali Shahin Shamsabadi
B. M. L. Srivastava
A. Bellet
Nathalie Vauquier
Emmanuel Vincent
Mohamed Maouche
Marc Tommasi
Nicolas Papernot
MIACV
113
33
0
23 Feb 2022
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
  and Elderly Speech Recognition
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Mengzhe Geng
Xurong Xie
Zi Ye
Tianzi Wang
Guinan Li
Shujie Hu
Xunying Liu
Helen Meng
59
32
0
21 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
97
858
0
07 Feb 2022
MFA: TDNN with Multi-scale Frequency-channel Attention for
  Text-independent Speaker Verification with Short Utterances
MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances
Tianchi Liu
Rohan Kumar Das
Kong Aik Lee
Haizhou Li
103
71
0
03 Feb 2022
MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification
MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification
Ladislav Mošner
Oldrich Plchot
L. Burget
J. Černocký
50
7
0
11 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
247
1,873
0
26 Oct 2021
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for
  Speaker Verification
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification
Qingjian Lin
Lin Yang
Xuyang Wang
Xiaoyi Qin
Junjie Wang
Ming Li
53
21
0
09 Oct 2021
Fine-tuning wav2vec2 for speaker recognition
Fine-tuning wav2vec2 for speaker recognition
Nik Vaessen
David A. van Leeuwen
91
107
0
30 Sep 2021
The JHU submission to VoxSRC-21: Track 3
The JHU submission to VoxSRC-21: Track 3
Jejin Cho
Jesus Villalba
Najim Dehak
123
21
0
28 Sep 2021
The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021
The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021
Miao Zhao
Yufeng Ma
Min Liu
Minqiang Xu
60
59
0
05 Sep 2021
Self-Supervised Learning Based Domain Adaptation for Robust Speaker
  Verification
Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification
Zhengyang Chen
Shuai Wang
Y. Qian
119
38
0
31 Aug 2021
RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker
  Verification
RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification
Yanfeng Wu
Chenkai Guo
Junan Zhao
Xiao Jin
Jing Xu
62
14
0
30 Aug 2021
NIST SRE CTS Superset: A large-scale dataset for telephony speaker
  recognition
NIST SRE CTS Superset: A large-scale dataset for telephony speaker recognition
S. O. Sadjadi
AI4TS
23
24
0
16 Aug 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
180
2,966
0
14 Jun 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of
  Transcribed Audio
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
110
376
0
13 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
122
884
0
11 Jun 2021
SpeechBrain: A General-Purpose Speech Toolkit
SpeechBrain: A General-Purpose Speech Toolkit
Mirco Ravanelli
Titouan Parcollet
Peter William VanHarn Plantinga
Aku Rouhe
Samuele Cornell
...
William Aris
Hwidong Na
Yan Gao
R. Mori
Yoshua Bengio
80
765
0
08 Jun 2021
End-to-End Speaker Diarization Conditioned on Speech Activity and
  Overlap Detection
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection
Yuki Takashima
Yusuke Fujita
Shinji Watanabe
Shota Horiguchi
Leibny Paola García-Perera
Kenji Nagamatsu
39
26
0
08 Jun 2021
123
Next