FastPitch: Parallel Text-to-speech with Pitch Prediction

11 June 2020

Papers citing "FastPitch: Parallel Text-to-speech with Pitch Prediction"

50 / 173 papers shown

Title
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions Florian Lux Pascal Tilli Sarina Meyer Ngoc Thang Vu 23 2 0 26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023 Florian Lux Julia Koch Sarina Meyer Thomas Bott Nadja Schauffler Pavel Denisov Antje Schweitzer Ngoc Thang Vu 24 6 0 26 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo Hyukhun Koh Kyomin Jung DiffM 47 0 0 23 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations Paarth Neekhara Shehzeen Samarah Hussain Rafael Valle Boris Ginsburg Rishabh Ranjan Shlomo Dubnov F. Koushanfar Julian McAuley 18 3 0 14 Oct 2023
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition Nick Rossenbach Benedikt Hilmes Ralf Schluter 12 3 0 12 Oct 2023
Prosody Analysis of Audiobooks Charuta Pethe Yunting Yin Felix D Childress Yunting Yin Steven Skiena 14 0 0 10 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset Ze Liu 24 0 0 08 Oct 2023
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization Alexandra Antonova 43 0 0 29 Sep 2023
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis X. Wei Jia Jia Xiang Li Zhiyong Wu Ziyi Wang 18 0 0 21 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Yinghao Aaron Li Cong Han Xilin Jiang N. Mesgarani 38 4 0 18 Sep 2023
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs Md Awsafur Rahman Bishmoy Paul Najibul Haque Sarker Zaber Ibn Abdul Hakim S. Fattah Mohammad Saquib 6 3 0 15 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation Yong Li Cheng Yu Guangzhi Sun Weiqin Zu Zheng Tian ... Wei Pan Chao Zhang Jun Wang Yang Yang Fanglei Sun 21 2 0 08 Sep 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023 C. Veaux R. Maia Spyridoula Papendreou 20 1 0 30 Aug 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos Ji-Hoon Kim Jaehun Kim Joon Son Chung 30 5 0 29 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook S. Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Yi Ren ... Wenwu Wang Xulong Zhang Roberto Togneri Min Zhang Björn W. Schuller LM&MA AuLLM 33 38 0 24 Aug 2023
Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection Cunhang Fan Jun Xue J. Tao Jiangyan Yi Chenglong Wang C. Zheng Zhao Lv 28 8 0 19 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS Myeongji Ko Yong-Hoon Choi DiffM 20 1 0 03 Aug 2023
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech Guangyan Zhang Thomas Merritt M. Ribeiro Biel Tura Vecino K. Yanagisawa ... Ammar Abbas Piotr Bilinski Roberto Barra-Chicote Daniel Korzekwa Jaime Lorenzo-Trueba DiffM 36 3 0 31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training H. Oh Sang-Hoon Lee Seong-Whan Lee DiffM 20 14 0 31 Jul 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Xixin Wu Shiyin Kang Helen Meng 30 7 0 29 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer Daegyeom Kim Seong-soo Hong Yong-Hoon Choi 25 2 0 20 Jul 2023
An analysis on the effects of speaker embedding choice in non auto-regressive TTS Adriana Stan Johannah O'Mahony 39 0 0 19 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis Siyang Wang G. Henter Joakim Gustafson Éva Székely 47 5 0 11 Jul 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Matt Le Apoorv Vyas Bowen Shi Brian Karrer Leda Sari ... Mary Williamson Vimal Manohar Yossi Adi Jay Mahadeokar Wei-Ning Hsu AuLLM 28 265 0 23 Jun 2023
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody Sofoklis Kakouros J. Šimko M. Vainio Antti Suni 18 5 0 16 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling Ji-Sang Hwang Sang-Hoon Lee Seong-Whan Lee 24 4 0 13 Jun 2023
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings Alexandra Antonova Evelina Bakhturina Boris Ginsburg KELM 17 6 0 04 Jun 2023
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis Erik Ekstedt Siyang Wang Éva Székely Joakim Gustafson Gabriel Skantze 21 6 0 29 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS Sewade Ogun Vincent Colotte Emmanuel Vincent DiffM 35 4 0 28 May 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model Xiang Li Songxiang Liu Max W. Y. Lam Zhiyong Wu Chao Weng Helen Meng DiffM 21 5 0 26 May 2023
EfficientSpeech: An On-Device Text to Speech Model Rowel Atienza 31 4 0 23 May 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model Jianzong Wang Xulong Zhang Haobin Tang Aolan Sun Ning Cheng Jing Xiao 23 1 0 23 Apr 2023
Transformers in Speech Processing: A Survey S. Latif Aun Zaidi Heriberto Cuayáhuitl Fahad Shamshad Moazzam Shoukat Junaid Qadir 42 47 0 21 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Rui Xue Yanqing Liu Lei He Xuejiao Tan Linquan Liu Ed Lin Sheng Zhao 34 7 0 06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS Siyang Wang G. Henter Joakim Gustafson Éva Székely 38 4 0 05 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations N. Shah Saiteja Kosgi Vishal Tambrahalli Neha Sahipjohn Anil Nelakanti Vineet Gandhi 19 8 0 01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis Ji-Hoon Kim Hongying Yang Yooncheol Ju Il-Hwan Kim Byeong-Yeol Kim 30 8 0 28 Feb 2023
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator Vladimir Bataev Roman Korostik Evgeny Shabalin Vitaly Lavrukhin Boris Ginsburg VLM 36 14 0 27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow Yoonhyung Lee Jinhyeok Yang Kyomin Jung 28 6 0 27 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS Junhyeok Lee Wonbin Jung Hyunjae Cho Jaeyeon Kim Jaehwan Kim 17 3 0 24 Feb 2023
Modular Deep Learning Jonas Pfeiffer Sebastian Ruder Ivan Vulić E. Ponti MoMe OOD 32 73 0 22 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations Shehzeen Samarah Hussain Paarth Neekhara Jocelyn Huang Jason Chun Lok Li Boris Ginsburg 13 21 0 16 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS Rohan Badlani Rafael Valle Kevin J. Shih J. F. Santos Francesco Ferroni Bryan Catanzaro 16 6 0 24 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study Massa Baali Tomoki Hayashi Hamdy Mubarak Soumi Maiti Shinji Watanabe W. El-Hajj Ahmed M. Ali 22 10 0 22 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation Abdullah Shahid S. Latif Junaid Qadir 31 23 0 10 Jan 2023
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Shinhyeok Oh HyeongRae Noh Yoonseok Hong Insoo Oh 20 0 0 15 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations Nikolaos Ellinas Myrsini Christidou Alexandra Vioni June Sig Sung Aimilios Chalamandaris Pirros Tsiakoulis P. Mastorocostas 17 7 0 29 Nov 2022
Evaluating and reducing the distance between synthetic and real speech distributions Christoph Minixhofer Ondˇrej Klejch P. Bell 36 7 0 29 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 38 18 0 17 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing J. Webber Cassia Valentini-Botinhao Evelyn Williams G. Henter Simon King 11 9 0 13 Nov 2022