Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

24 March 2018

Yuxuan Wang

Rif A. Saurous

Papers citing "Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron"

50 / 138 papers shown

Title
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud Hyouin Liu Zhikuan Zhang 34 0 0 12 May 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting Wenjie Qu Wenxiang Guo Changhao Pan Zehan Zhu Tao Jin Zhou Zhao VGen 54 1 0 29 Apr 2025
Everyday Speech in the Indian Subcontinent Utkarsh Pathak 56 1 0 24 Feb 2025
The Role of Prosody in Spoken Question Answering Jie Chi Maureen de Seyssel Natalie Schluter 54 0 0 08 Feb 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation Anna Min Chenxu Hu Yi Ren Hang Zhao 66 0 0 01 Feb 2025
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles Tian-Hao Zhang Jiawei Zhang Jun Wang Xinyuan Qian Xu-cheng Yin CVBM 54 0 0 02 Jan 2025
Exploring synthetic data for cross-speaker style transfer in style representation based TTS Lucas Ueda Leonardo B. de M. M. Marques Flávio O. Simões Mário Uliani Neto Fernando Runstein Bianca Dal Bó Paula D. P. Costa 33 0 0 25 Sep 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts Thomas Bott Florian Lux Ngoc Thang Vu 43 6 0 10 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation Min-Jae Hwang Ilia Kulikov Benjamin Peloquin Hongyu Gong Peng-Jen Chen Ann Lee 35 2 0 04 Jun 2024
Audio Anti-Spoofing Detection: A Survey Menglu Li Yasaman Ahmadiadli Xiao-Ping Zhang 50 19 0 22 Apr 2024
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition Rendi Chevi Alham Fikri Aji 32 2 0 22 Feb 2024
Natural language guidance of high-fidelity text-to-speech with synthetic annotations Daniel Lyth Simon King 29 37 0 02 Feb 2024
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis Yu Zhang Rongjie Huang Ruiqi Li Jinzheng He Yan Xia Feiyang Chen Xinyu Duan Baoxing Huai Zhou Zhao VLM 31 18 0 17 Dec 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo Hyukhun Koh Kyomin Jung DiffM 54 0 0 23 Oct 2023
Prosody Analysis of Audiobooks Charuta Pethe Yunting Yin Felix D Childress Yunting Yin Steven Skiena 27 1 0 10 Oct 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin Tao Li Chenxu Hu Jian Cong Xinfa Zhu Jingbei Li Qiao Tian Yuping Wang Linfu Xie DiffM 43 8 0 02 Sep 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis Yi Meng Xiang Li Zhiyong Wu Tingtian Li Zixun Sun Xinyu Xiao Chi Sun Hui Zhan Helen Meng 21 0 0 30 Aug 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech Hyungchan Yoon Changhwan Kim Eunwoo Song Hyun-Wook Yoon Hong-Goo Kang 42 1 0 28 Aug 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Xixin Wu Shiyin Kang Helen Meng 35 7 0 29 Jul 2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding Ziqian Ning Yuepeng Jiang Pengcheng Zhu Jixun Yao Shuai Wang Linfu Xie Mengxiao Bi 34 10 0 21 May 2023
Controllable Speaking Styles Using a Large Language Model A. Sigurgeirsson Simon King 25 2 0 17 May 2023
Using Deepfake Technologies for Word Emphasis Detection Eran Kaufman Lee-Ad Gottlieb 35 0 0 12 May 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention Victor Agostinelli Lizhong Chen 27 1 0 17 Apr 2023
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents Suhee Jo Younggun Lee Yookyung Shin Yeongtae Hwang Taesu Kim 13 3 0 15 Mar 2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis Chunyu Qiang Peng Yang Hao Che Ying Zhang Xiaorui Wang Zhong-ming Wang 51 9 0 14 Mar 2023
Do Prosody Transfer Models Transfer Prosody? A. Sigurgeirsson Simon King DiffM 12 7 0 07 Mar 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt Dongchao Yang Songxiang Liu Rongjie Huang Chao Weng Helen Meng DiffM VLM 31 85 0 31 Jan 2023
Modelling low-resource accents without accent-specific TTS frontend Georgi Tinchev Marta Czarnowska Kamil Deja K. Yanagisawa Marius Cotescu 31 4 0 11 Jan 2023
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Shinhyeok Oh HyeongRae Noh Yoonseok Hong Insoo Oh 20 0 0 15 Dec 2022
Contextual Expressive Text-to-Speech Jianhong Tu Zeyu Cui Xiaohuan Zhou Siqi Zheng Kaiqin Hu Ju Fan Chang Zhou 22 2 0 26 Nov 2022
Prosody-controllable spontaneous TTS with neural HMMs Harm Lameris Shivam Mehta G. Henter Joakim Gustafson Éva Székely 46 15 0 24 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints Zhichao Wang Xinsheng Wang Linfu Xie Yuan-Jui Chen Qiao Tian Yuping Wang 30 5 0 16 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis Karolos Nikitaras Konstantinos Klapsas Nikolaos Ellinas Georgia Maniati June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 19 0 0 01 Nov 2022
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection L. Attorresi Davide Salvi Clara Borrelli Paolo Bestagini Stefano Tubaro 21 22 0 31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong Yun Wang Prabhav Agrawal Vimal Manohar Jilong Wu Thilo Kohler Qing He 23 0 0 28 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi Myeonghun Jeong Minchan Kim Sung Hwan Mun N. Kim DiffM 27 5 0 12 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos Björn W. Schuller Gokcce .Iymen M. Sezgin Xiangheng He ... Shuo Liu Silvan Mertes Elisabeth André Ruibo Fu Jianhua Tao 20 53 0 06 Oct 2022
Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks Cassia Valentini-Botinhao M. Ribeiro O. Watts Korin Richmond G. Henter 16 1 0 22 Sep 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Yifan Hu Pengkai Yin Rui Liu F. Bao Guanglai Gao 18 5 0 22 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS Liumeng Xue Frank Soong Shaofei Zhang Linfu Xie 29 23 0 14 Sep 2022
Speech Synthesis with Mixed Emotions Kun Zhou Berrak Sisman R. Rana B.W.Schuller Haizhou Li 27 44 0 11 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset Xiang Li Changhe Song X. Wei Zhiyong Wu Jia Jia Helen Meng 29 4 0 10 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai Tom Ko Yu Zhang 27 4 0 03 Aug 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 36 10 0 13 Jul 2022
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS) Ariadna Sánchez Alessio Falai Ziyao Zhang Orazio Angelini K. Yanagisawa 38 7 0 04 Jul 2022
Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS) Ziyao Zhang Alessio Falai Ariadna Sánchez Orazio Angelini K. Yanagisawa 29 4 0 04 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Wenbo Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 50 26 0 29 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer S. Karlapati Penny Karanasou Mateusz Lajszczak Ammar Abbas Alexis Moinet Peter Makarov Raymond Li Arent van Korlaar Simon Slangen Thomas Drugman 24 15 0 27 Jun 2022
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis Yihan Wu Xi Wang S. Zhang Lei He Ruihua Song J. Nie 42 15 0 25 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 42 38 0 30 May 2022