Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02392
Cited By
v1
v2 (latest)
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
3 November 2021
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (428★)
Papers citing
"A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion"
50 / 70 papers shown
Title
Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching
Shoutrik Das
Nishant Singh
Arjun Gangwar
S. Umesh
17
0
0
19 Jun 2025
Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling
Xiaodan Chen
Xiaoxue Gao
M. Quoy
Alexandre Pitti
Nancy F.Chen
24
0
0
13 Jun 2025
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
Yisi Liu
Chenyang Wang
Hanjo Kim
Raniya Khan
Gopala Anumanchipalli
104
0
0
12 Jun 2025
Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
20
0
0
10 Jun 2025
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Seymanur Akti
T. Nguyen
Alexander Waibel
DRL
146
0
0
04 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
39
0
0
01 Jun 2025
ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization
Pengyu Ren
Wenhao Guan
Kaidi Wang
Peijie Chen
Q. Hong
Lin Li
44
0
0
01 Jun 2025
Eta-WavLM: Efficient Speaker Identity Removal in Self-Supervised Speech Representations Using a Simple Linear Equation
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
111
1
0
25 May 2025
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
Soumya Dutta
Avni Jain
Sriram Ganapathy
119
0
0
23 May 2025
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Scott Wellington
Xuechen Liu
Junichi Yamagishi
113
0
0
22 Apr 2025
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
145
0
0
11 Apr 2025
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
Keren Shao
Kai Chen
Matthew Baas
Shlomo Dubnov
55
1
0
08 Apr 2025
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
Wenyu Wang
Yiquan Zhou
Jihua Zhu
Hongwu Ding
Jiacheng Xu
Shihao Li
DRL
59
0
0
08 Apr 2025
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect
Hedi Naouara
Jean-Pierre Lorré
Jérôme Louradour
76
0
0
03 Apr 2025
EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters
Xuli Shen
Hua Cai
Dingding Yu
Weilin Shen
Qing-Song Xu
Xiangyang Xue
109
0
0
25 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu
Yu Fang
Zhouhan Lin
91
1
0
07 Mar 2025
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Jialong Zuo
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Xize Cheng
...
Wenrui Liu
Guangyan Zhang
Zehai Tu
Yiwen Guo
Zhou Zhao
96
2
0
08 Feb 2025
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
Jiaxing Zhao
Q. Yang
Yixing Peng
Detao Bai
Shimin Yao
...
Xiang Chen
Shenghao Fu
Weixuan chen
Xihan Wei
Liefeng Bo
VGen
AuLLM
96
6
0
28 Jan 2025
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
Ünal Ege Gaznepoglu
Nils Peters
136
0
0
22 Jan 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
77
1
0
11 Jan 2025
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
Samir Sadok
Simon Leglaive
Laurent Girin
Gaël Richard
Xavier Alameda-Pineda
129
3
0
10 Jan 2025
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
Joonyong Park
Daisuke Saito
Nobuaki Minematsu
114
0
0
04 Dec 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
68
1
0
25 Sep 2024
Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas Ueda
Leonardo B. de M. M. Marques
Flávio O. Simões
Mário Uliani Neto
Fernando Runstein
Bianca Dal Bó
Paula D. P. Costa
91
0
0
25 Sep 2024
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
Simon Malan
Benjamin van Niekerk
Herman Kamper
109
0
0
22 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
86
0
0
17 Sep 2024
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Wei Chen
Xintao Zhao
Jun Chen
Binzhu Sha
Zhiwei Lin
Zhiyong Wu
95
1
0
10 Sep 2024
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
99
0
0
03 Sep 2024
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
A. R. Bargum
Simon Lajboschitz
Cumhur Erkut
73
1
0
29 Aug 2024
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
75
0
0
21 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee
Yoori Oh
Injune Hwang
Kyogu Lee
CVBM
49
3
0
19 Aug 2024
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Xiaoxiao Miao
Yuxiang Zhang
Xin Wang
N. Tomashenko
D. Soh
Ian Mcloughlin
116
2
0
12 Aug 2024
Distortion Recovery: A Two-Stage Method for Guitar Effect Removal
Ying-Shuo Lee
Yueh-Po Peng
Jui-Te Wu
Ming Cheng
Li Su
Yi-Hsuan Yang
67
1
0
23 Jul 2024
A Preliminary Investigation on Flexible Singing Voice Synthesis Through Decomposed Framework with Inferrable Features
Lester Phillip Violeta
Taketo Akama
65
0
0
12 Jul 2024
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
99
1
0
08 Jul 2024
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
96
21
0
20 Jun 2024
End-to-end Streaming model for Low-Latency Speech Anonymization
Waris Quamer
Ricardo Gutierrez-Osuna
96
0
0
13 Jun 2024
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Zongyang Du
Junchen Lu
Kun Zhou
Lakshmish Kaushik
Berrak Sisman
102
1
0
02 May 2024
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
94
0
0
01 May 2024
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
Yimin Deng
Huaizhen Tang
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
DRL
78
1
0
16 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
145
1
0
16 Jan 2024
StreamVC: Real-Time Low-Latency Voice Conversion
Yang Yang
Y. Kartynnik
Yunpeng Li
Jiuqiang Tang
Xing Li
George Sung
Matthias Grundmann
107
15
0
05 Jan 2024
OpenVoice: Versatile Instant Voice Cloning
Zengyi Qin
Wenliang Zhao
Xumin Yu
Xin Sun
VLM
77
24
0
03 Dec 2023
Low-latency Real-time Voice Conversion on CPU
Konstantine Sadov
Matthew Hutter
Asara Near
VLM
68
1
0
01 Nov 2023
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Matthew Baas
Herman Kamper
57
4
0
12 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
71
7
0
08 Oct 2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023
Yi-Hua Zhou
Meng Chen
Yi Lei
Jihua Zhu
Weifeng Zhao
56
5
0
08 Oct 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Chun-Yi Kuan
Chen-An Li
Tsung-Yuan Hsu
Tzu-Quan Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
80
6
0
25 Sep 2023
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
Lester Phillip Violeta
Wen-Chin Huang
D. Ma
Ryuichi Yamamoto
Kazuhiro Kobayashi
Tomoki Toda
70
5
0
18 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
62
2
0
06 Sep 2023
1
2
Next