v1v2 (latest)

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

3 November 2021

ArXiv (abs)PDF HTML Github (428★)

Papers citing "A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion"

20 / 70 papers shown

Title
FSD: An Initial Chinese Dataset for Fake Song Detection Yuankun Xie Jingjing Zhou Xiaolin Lu Zhenghao Jiang Yuxin Yang Haonan Cheng Long Ye 88 15 0 05 Sep 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling Zhichao Wang Xinsheng Wang Qicong Xie Tao Li Linfu Xie Qiao Tian Yuping Wang 114 4 0 03 Sep 2023
Vocoder drift compensation by x-vector alignment in speaker anonymisation Michele Panariello Massimiliano Todisco Nicholas W. D. Evans 65 2 0 17 Jul 2023
Rhythm Modeling for Voice Conversion Benjamin van Niekerk M. Carbonneau Herman Kamper 74 8 0 12 Jul 2023
Disentanglement in a GAN for Unconditional Speech Synthesis Matthew Baas Herman Kamper DiffM 84 4 0 04 Jul 2023
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units Junchen Lu Berrak Sisman Mingyang Zhang Haizhou Li 85 4 0 29 Jun 2023
The Singing Voice Conversion Challenge 2023 Wen-Chin Huang Lester Phillip Violeta Songxiang Liu Jiatong Shi Tomoki Toda 116 50 0 26 Jun 2023
Zero-Shot Automatic Pronunciation Assessment Hongfu Liu Mingqiang Shi Ye Wang 53 4 0 31 May 2023
Voice Conversion With Just Nearest Neighbors Matthew Baas Benjamin van Niekerk Herman Kamper SSL 115 61 0 30 May 2023
Speaker anonymization using orthogonal Householder neural network Xiaoxiao Miao Xin Wang Erica Cooper Junichi Yamagishi N. Tomashenko BDL 74 21 0 30 May 2023
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion Xintao Zhao Shuai Wang Yang Chao Zhiyong Wu Helen Meng 71 3 0 16 May 2023
Multi-modal Facial Affective Analysis based on Masked Autoencoder Wei Zhang Bowen Ma Feng Qiu Yu-qiong Ding CVBM 99 29 0 20 Mar 2023
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions Jun Rekimoto 96 20 0 03 Mar 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion Houjian Guo Chaoran Liu C. Ishi H. Ishiguro BDL 100 13 0 16 Feb 2023
Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline Paul-Gauthier Noé Xiaoxiao Miao Xin Wang Junichi Yamagishi J. Bonastre D. Matrouf 97 7 0 29 Nov 2022
Self-Supervised Learning for Speech Enhancement through Synthesis Bryce Irvin Marko Stamenovic M. Kegler Li-Chia Yang 88 21 0 04 Nov 2022
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations Themos Stafylakis Ladislav Mošner Sofoklis Kakouros Oldrich Plchot L. Burget J. Černocký SSL 60 10 0 15 Oct 2022
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery W. V. D. Merwe Herman Kamper J. D. Preez 47 2 0 23 Jun 2022
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions Xiaoxiao Miao Xin Wang Erica Cooper Junichi Yamagishi N. Tomashenko 70 11 0 28 Mar 2022
Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models Xiaoxiao Miao Xin Wang Erica Cooper Junichi Yamagishi N. Tomashenko 168 25 0 26 Feb 2022