YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

4 December 2021

Edresson Casanova

Julian Weber

C. Shulby

Arnaldo Cândido Júnior

Eren Golge

M. Ponti

ArXiv PDF HTML

Papers citing "YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone"

28 / 78 papers shown

Title
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer Xiaofei Wang Manthan Thakker Zhuo Chen Naoyuki Kanda Sefik Emre Eskimez Sanyuan Chen M. Tang Shujie Liu Jinyu Li Takuya Yoshioka 18 79 0 14 Aug 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer Sang-Hoon Lee Haram Choi H. Oh Seong-Whan Lee BDL 23 9 0 30 Jul 2023
An analysis on the effects of speaker embedding choice in non auto-regressive TTS Adriana Stan Johannah O'Mahony 32 0 0 19 Jul 2023
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis Zhe Ye Ziyue Jiang Yi Ren Jinglin Liu Chen Zhang Xiang Yin Zejun Ma Zhou Zhao 40 4 0 06 Jun 2023
Text-to-Speech Pipeline for Swiss German -- A comparison Tobias Bollinger Jan Deriu Manfred Vogel DiffM 16 0 0 31 May 2023
Voice Conversion With Just Nearest Neighbors Matthew Baas Benjamin van Niekerk Herman Kamper SSL 32 48 0 30 May 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus Yuma Koizumi Heiga Zen Shigeki Karita Yifan Ding Kohei Yatabe Nobuyuki Morioka M. Bacchiani Yu Zhang Wei Han Ankur Bapna 36 66 0 30 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis Seong-Hyun Park Bohyung Kim Tae-Hyun Oh 32 1 0 26 May 2023
An Android Robot Head as Embodied Conversational Agent Marcel Heisler C. Becker-Asano LM&Ro LLMAG 29 0 0 18 May 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations Shehzeen Samarah Hussain Paarth Neekhara Jocelyn Huang Jason Chun Lok Li Boris Ginsburg 13 21 0 16 Feb 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Chengyi Wang Sanyuan Chen Yu-Huan Wu Zi-Hua Zhang Long Zhou ... Huaming Wang Jinyu Li Lei He Sheng Zhao Furu Wei 43 641 0 05 Jan 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models Yinghao Aaron Li Cong Han N. Mesgarani 17 18 0 29 Dec 2022
Memories are One-to-Many Mapping Alleviators in Talking Face Generation Anni Tang Tianyu He Xuejiao Tan Jun Ling Liang Song CVBM 26 23 0 09 Dec 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 36 18 0 17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 18 46 0 17 Nov 2022
The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement Anastasia Kuznetsova Aswin Sivaraman Minje Kim 24 3 0 14 Nov 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi Myeonghun Jeong Minchan Kim Sung Hwan Mun N. Kim DiffM 24 5 0 12 Oct 2022
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild Xuechen Liu Xin Wang Md. Sahidullah J. Patino Héctor Delgado ... Massimiliano Todisco Junichi Yamagishi Nicholas W. D. Evans A. Nautsch Kong Aik Lee 32 173 0 05 Oct 2022
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed Mei-Shuo Chen Z. Duan 22 10 0 23 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation Zalan Borsos Raphaël Marinier Damien Vincent Eugene Kharitonov Olivier Pietquin ... Dominik Roblek O. Teboul David Grangier Marco Tagliasacchi Neil Zeghidour AuLLM 41 567 0 07 Sep 2022
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS) Ariadna Sánchez Alessio Falai Ziyao Zhang Orazio Angelini K. Yanagisawa 38 7 0 04 Jul 2022
Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models Alon Levkovitch Eliya Nachmani Lior Wolf DiffM 19 29 0 05 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 33 38 0 30 May 2022
Heterogeneous Target Speech Separation Hyunjae Cho Wonbin Jung Junhyeok Lee Paris Smaragdis Sanghyun Woo 46 26 0 07 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios Yihan Wu Xu Tan Bohan Li Lei He Sheng Zhao Ruihua Song Tao Qin Tie-Yan Liu VLM DiffM 14 66 0 01 Apr 2022
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 223 239 0 25 Sep 2019
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 224 2,234 0 14 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Z. Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 207 820 0 12 Jun 2018