AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

14 May 2019

Kaizhi Qian

Papers citing "AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss"

50 / 105 papers shown

Title
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions Wonjune Kang M. Hasegawa-Johnson D. Roy 37 8 0 19 May 2022
Dictionary Attacks on Speaker Verification Mirko Marras Pawel Korus Anubhav Jain N. Memon AAML 34 9 0 24 Apr 2022
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers Kaizhi Qian Yang Zhang Heting Gao Junrui Ni Cheng-I Jeff Lai David D. Cox M. Hasegawa-Johnson Shiyu Chang DRL 30 110 0 20 Apr 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion Weida Liang Lantian Li Wenqiang Du Dong Wang 56 0 0 08 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis Karren D. Yang Dejan Marković Steven Krenn Vasu Agrawal Alexander Richard VGen 18 32 0 31 Mar 2022
HiFi-VC: High Quality ASR-Based Voice Conversion A. Kashkin I. Karpukhin S. Shishkin 29 5 0 31 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising flows Thomas Merritt Abdelhamid Ezzerg Piotr Bilinski Magdalena Proszewska Kamil Pokora Roberto Barra-Chicote Daniel Korzekwa 36 14 0 15 Mar 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier Jinglin Liu Chengxi Li Yi Ren Zhiying Zhu Zhou Zhao DiffM 35 16 0 27 Feb 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge Ziyi Chen Hua Hua Yuxiang Zhang Ming Li Pengyuan Zhang 27 0 0 29 Jan 2022
Noise-robust voice conversion with domain adversarial training Hongqiang Du Lei Xie Haizhou Li 19 11 0 26 Jan 2022
Invertible Voice Conversion Zexin Cai Ming Li BDL 38 1 0 26 Jan 2022
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering Shunyu Yao Ruizhe Zhong Yichao Yan Guangtao Zhai Xiaokang Yang CVBM 32 90 0 03 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion Wendong Gan Bolong Wen Yin Yan Haitao Chen Zhichao Wang Hongqiang Du Lei Xie Kaixuan Guo Hai Li 15 14 0 02 Jan 2022
Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features Trung D. Q. Dang Dung T. Tran Peter Chin K. Koishida SSL 19 15 0 08 Dec 2021
Zero-shot Singing Technique Conversion Brendan O'Connor S. Dixon Georgy Fazekas 35 5 0 16 Nov 2021
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion Damien Ronssin Milos Cernak 20 10 0 12 Nov 2021
SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines Haozhe Zhang Zexin Cai Xiaoyi Qin Ming Li 54 15 0 06 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion Benjamin van Niekerk M. Carbonneau Julian Zaïdi Matthew Baas Hugo Seuté Herman Kamper DRL 27 111 0 03 Nov 2021
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations Hyeong-Seok Choi Juheon Lee W. Kim Jie Hwan Lee Hoon Heo Kyogu Lee 37 151 0 27 Oct 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning Shijun Wang Dimche Kostadinov Damian Borth 29 11 0 27 Oct 2021
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion Zongyang Du Berrak Sisman Kun Zhou Haizhou Li 18 24 0 20 Oct 2021
Toward Degradation-Robust Voice Conversion Chien-yu Huang Kai-Wei Chang Hung-yi Lee 38 7 0 14 Oct 2021
Voice Reenactment with F0 and timing constraints and adversarial learning of conversions F. Bous L. Benaroya Nicolas Obin Axel Roebel 19 2 0 07 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 51 16 0 06 Oct 2021
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation Yuanxun Lu Jinxiang Chai Xun Cao 29 82 0 22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World Emily Wenger Max Bronckers Christian Cianfarani Jenna Cryan Angela Sha Haitao Zheng Ben Y. Zhao AAML 40 39 0 20 Sep 2021
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks Russell Sammut Bonnici C. Saitis Martin Benning GAN 36 15 0 05 Sep 2021
StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition Shoki Sakamoto Akira Taniguchi T. Taniguchi Hirokazu Kameoka BDL 31 5 0 10 Aug 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion Yinghao Aaron Li A. Zare N. Mesgarani 35 99 0 21 Jul 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion Disong Wang Liqun Deng Y. Yeung Xiao Chen Xunying Liu Helen Meng DRL 22 136 0 18 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations Jheng-hao Lin Yist Y. Lin C. Chien Hung-yi Lee 30 56 0 07 Apr 2021
Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads S. Nercessian Andy M. Sarroff K. Werner 17 29 0 15 Mar 2021
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Nobukatsu Hojo 38 57 0 25 Feb 2021
Accent and Speaker Disentanglement in Many-to-many Voice Conversion Zhichao Wang Wenshuo Ge Xiong Wang Shan Yang Wendong Gan Haitao Chen Hai Li Lei Xie Xiulin Li CVBM 36 32 0 17 Nov 2020
Optimizing voice conversion network with cycle consistency loss of speaker identity Hongqiang Du Xiaohai Tian Lei Xie Haizhou Li 21 17 0 17 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 29 21 0 08 Nov 2020
Semi-supervised Learning for Singing Synthesis Timbre J. Bonada Merlijn Blaauw 27 4 0 05 Nov 2020
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization Yen-Hao Chen Da-Yi Wu Tsung-Han Wu Hung-yi Lee 34 107 0 31 Oct 2020
PPG-based singing voice conversion with adversarial representation learning Zhonghao Li Benlai Tang Xiang Yin Yuan Wan Linjia Xu Chen Shen Zejun Ma 19 37 0 28 Oct 2020
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Nobukatsu Hojo 29 78 0 22 Oct 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS Wen-Chin Huang Tomoki Hayashi Shinji Watanabe T. Toda DRL 15 39 0 06 Oct 2020
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion Yi Zhao Wen-Chin Huang Xiaohai Tian Junichi Yamagishi Rohan Kumar Das Tomi Kinnunen Zhenhua Ling T. Toda 27 206 0 28 Aug 2020
VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture Da-Yi Wu Yen-Hao Chen Hung-yi Lee 10 99 0 07 Jun 2020
Speech-to-Singing Conversion based on Boundary Equilibrium GAN Da-Yi Wu Yi-Hsuan Yang GAN 14 8 0 28 May 2020
Contrastive Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised Learning of Disentangled Speech Representations Janek Ebbers Michael Kuhlmann Tobias Cord-Landwehr Reinhold Haeb-Umbach DRL CoGe SSL 31 4 0 26 May 2020
Many-to-Many Voice Transformer Network Hirokazu Kameoka Wen-Chin Huang Kou Tanaka Takuhiro Kaneko Nobukatsu Hojo T. Toda ViT 30 30 0 18 May 2020
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion Kun Zhou Berrak Sisman Mingyang Zhang Haizhou Li 30 52 0 13 May 2020
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data Seung-won Park Doo-young Kim Myun-chul Joe 26 40 0 07 May 2020
Zero-Shot Learning and its Applications from Autonomous Vehicles to COVID-19 Diagnosis: A Review Mahdi Rezaei Mahsa Shahidi 29 53 0 29 Apr 2020