SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

20 July 2023

Papers citing "SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer"

16 / 16 papers shown

Title
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Rui Xue Yanqing Liu Lei He Xuejiao Tan Linquan Liu Ed Lin Sheng Zhao 100 7 0 06 Mar 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt Dongchao Yang Songxiang Liu Rongjie Huang Chao Weng Helen Meng DiffM VLM 89 102 0 31 Jan 2023
Unsupervised word-level prosody tagging for controllable speech synthesis Yiwei Guo Chenpeng Du Kai Yu 55 15 0 15 Feb 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 234 415 0 04 Dec 2021
Controllable Emotion Transfer For End-to-End Speech Synthesis Tao Li Shan Yang Liumeng Xue Lei Xie 77 74 0 17 Nov 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 179 1,952 0 12 Oct 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction Adrian Lañcucki 96 342 0 11 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 105 1,411 0 08 Jun 2020
Understanding and Improving Layer Normalization Jingjing Xu Xu Sun Zhiyuan Zhang Guangxiang Zhao Junyang Lin FAtt 107 357 0 16 Nov 2019
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram Ryuichi Yamamoto Eunwoo Song Jae-Min Kim 62 820 0 25 Oct 2019
Adversarially Trained End-to-end Korean Singing Voice Synthesis System Juheon Lee Hyeong-Seok Choi Chang-Bin Jeon Junghyun Koo Kyogu Lee 75 78 0 06 Aug 2019
Learning latent representations for style control and transfer in end-to-end speech synthesis Ya-Jie Zhang Shifeng Pan Lei He Zhenhua Ling BDL SSL DRL 66 229 0 11 Dec 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 85 2,704 0 16 Dec 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 166 1,831 0 29 Mar 2017
WaveNet: A Generative Model for Raw Audio Aaron van den Oord Sander Dieleman Heiga Zen Karen Simonyan Oriol Vinyals Alex Graves Nal Kalchbrenner A. Senior Koray Kavukcuoglu DiffM 406 7,421 0 12 Sep 2016
Sequence to Sequence Learning with Neural Networks Ilya Sutskever Oriol Vinyals Quoc V. Le AIMat 448 20,606 0 10 Sep 2014