Deep Voice 2: Multi-Speaker Neural Text-to-Speech

24 May 2017

Papers citing "Deep Voice 2: Multi-Speaker Neural Text-to-Speech"

50 / 87 papers shown

Title
SF-Speech: Straightened Flow for Zero-Shot Voice Clone Xuyuan Li Zengqiang Shang Hua Hua Peiyang Shi Chen Yang Li Wang Pengyuan Zhang 55 2 0 16 Oct 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection Ismail Rasim Ulgen Shreeram Suresh Chandra Junchen Lu Berrak Sisman 183 0 0 30 Aug 2024
Speech as Interactive Design Material (SIDM): How to design and evaluate task-tailored synthetic voices? Mateusz Dubiel M. Aylett Anuschka Schmitt Zilin Ma Gary Hsieh Thiemo Wambsganss 23 0 0 26 Feb 2024
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin Tao Li Chenxu Hu Jian Cong Xinfa Zhu Jingbei Li Qiao Tian Yuping Wang Linfu Xie DiffM 41 8 0 02 Sep 2023
An analysis on the effects of speaker embedding choice in non auto-regressive TTS Adriana Stan Johannah O'Mahony 39 0 0 19 Jul 2023
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation K. Lakshminarayana C. Dittmar N. Pia Emanuel Habets 34 0 0 16 Jun 2023
Using Deepfake Technologies for Word Emphasis Detection Eran Kaufman Lee-Ad Gottlieb 35 0 0 12 May 2023
Do Prosody Transfer Models Transfer Prosody? A. Sigurgeirsson Simon King DiffM 12 7 0 07 Mar 2023
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities Amin Azmoodeh Ali Dehghantanha 45 2 0 26 Nov 2022
Contextual Expressive Text-to-Speech Jianhong Tu Zeyu Cui Xiaohuan Zhou Siqi Zheng Kaiqin Hu Ju Fan Chang Zhou 17 2 0 26 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 38 18 0 17 Nov 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis Yifan Hu Rui Liu Guanglai Gao Haizhou Li 122 7 0 27 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos Björn W. Schuller Gokcce .Iymen M. Sezgin Xiangheng He ... Shuo Liu Silvan Mertes Elisabeth André Ruibo Fu Jianhua Tao 20 53 0 06 Oct 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech Yusuke Nakai Yuki Saito K. Udagawa Hiroshi Saruwatari AAML 25 1 0 26 Sep 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 34 6 0 22 Sep 2022
Visualising Model Training via Vowel Space for Text-To-Speech Systems Binu Abeysinghe Jesin James C. Watson Felix Marattukalam 26 2 0 21 Aug 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion Yinjiao Lei Shan Yang Jian Cong Linfu Xie Dan Su DiffM 52 12 0 05 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Wenbo Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 50 26 0 29 Jun 2022
Show Me Your Face, And I'll Tell You How You Speak Christen Millerdurai L. A. Khaliq Timon Ulrich CVBM 68 0 0 28 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality Xu Tan Jiawei Chen Haohe Liu Jian Cong Chen Zhang ... Lei He Frank Soong Tao Qin Sheng Zhao Tie-Yan Liu 44 213 0 09 May 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios Yihan Wu Xu Tan Bohan Li Lei He Sheng Zhao Ruihua Song Tao Qin Tie-Yan Liu VLM DiffM 14 67 0 01 Apr 2022
Variational Auto-Encoder based Mandarin Speech Cloning Qingyu Xing Xiaohan Ma 21 0 0 06 Mar 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs Songxiang Liu Dan Su Dong Yu DiffM 70 65 0 28 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 68 18 0 24 Jan 2022
Textless Speech-to-Speech Translation on Real Data Ann Lee Hongyu Gong Paul-Ambroise Duquenne Holger Schwenk Peng-Jen Chen ... Sravya Popuri Yossi Adi J. Pino Jiatao Gu Wei-Ning Hsu 28 142 0 15 Dec 2021
How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey Zahra Khanjani Gabrielle Watson V. P Janeja 25 25 0 28 Nov 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 33 23 0 25 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech Sung-Feng Huang Chyi-Jiunn Lin Da-Rong Liu Yi-Chen Chen Hung-yi Lee 18 56 0 07 Nov 2021
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 12 17 0 07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection Joel Frank Lea Schonherr DiffM 129 123 0 04 Nov 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data Haitong Zhang Yue Lin 15 0 0 14 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 51 16 0 06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks E. Hortal Rodrigo Brechard Alarcia GAN 26 2 0 06 Oct 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis Jinhyeok Yang Jaesung Bae Taejun Bak Young-Ik Kim Hoon-Young Cho 28 36 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 26 3 0 21 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dong Min Dong Bok Lee Eunho Yang Sung Ju Hwang 25 160 0 06 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review Jabeen Summaira Xi Li Amin Muhammad Shoib Songyuan Li Abdul Jabbar HAI 18 55 0 24 May 2021
Speaker disentanglement in video-to-speech conversion Dan Oneaţă Adriana Stan H. Cucu 24 9 0 20 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction Stanislav Beliaev Boris Ginsburg 21 8 0 16 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS Ye Jia Heiga Zen Jonathan Shen Yu Zhang Yonghui Wu SSL 45 81 0 28 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice Mingjian Chen Xu Tan Bohan Li Yanqing Liu Tao Qin Sheng Zhao Tie-Yan Liu VLM DiffM 37 187 0 01 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Dan Su 42 22 0 12 Feb 2021
Universal Neural Vocoding with Parallel WaveNet Yunlong Jiao Adam Gabry's Georgi Tinchev Bartosz Putrycz Daniel Korzekwa V. Klimkov 36 42 0 01 Feb 2021
Low-resource expressive text-to-speech using data augmentation Goeric Huybrechts Thomas Merritt Giulia Comini Bartek Perz Raahil Shah Jaime Lorenzo-Trueba 26 50 0 11 Nov 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines Yao Shi Hui Bu Xin Xu Shaojing Zhang Ming Li 30 219 0 22 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis Zhifeng Kong Ming-Yu Liu Jiaji Huang Kexin Zhao Bryan Catanzaro DiffM BDL 34 1,392 0 21 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis Jiawei Chen Xu Tan Jian Luan Tao Qin Tie-Yan Liu VLM 19 92 0 03 Sep 2020
Adversarial representation learning for private speech generation David Ericsson Adam Östberg Edvin Listo Zec John Martinsson Olof Mogren 27 16 0 16 Jun 2020