ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.06873
  4. Cited By
FastPitch: Parallel Text-to-speech with Pitch Prediction

FastPitch: Parallel Text-to-speech with Pitch Prediction

11 June 2020
Adrian Lañcucki
ArXivPDFHTML

Papers citing "FastPitch: Parallel Text-to-speech with Pitch Prediction"

50 / 173 papers shown
Title
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
23
2
0
26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
24
6
0
26 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal
  point processes
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
47
0
0
23 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self
  Transformations
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
18
3
0
14 Oct 2023
On the Relevance of Phoneme Duration Variability of Synthesized Training
  Data for Automatic Speech Recognition
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
12
3
0
12 Oct 2023
Prosody Analysis of Audiobooks
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
14
0
0
10 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning
  Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
24
0
0
08 Oct 2023
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR
  Customization
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
Alexandra Antonova
43
0
0
29 Sep 2023
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion
  Analysis
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
X. Wei
Jia Jia
Xiang Li
Zhiyong Wu
Ziyi Wang
18
0
0
21 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
38
4
0
18 Sep 2023
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown
  Multi-Class Ensemble of CNNs
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs
Md Awsafur Rahman
Bishmoy Paul
Najibul Haque Sarker
Zaber Ibn Abdul Hakim
S. Fattah
Mohammad Saquib
6
3
0
15 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech Generation
Yong Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
21
2
0
08 Sep 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
20
1
0
30 Aug 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent
  Videos
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
30
5
0
29 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
33
38
0
24 Aug 2023
Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake
  Speech Detection
Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection
Cunhang Fan
Jun Xue
J. Tao
Jiangyan Yi
Chenglong Wang
C. Zheng
Zhao Lv
28
8
0
19 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual
  Discriminators for High-Fidelity Multi-Speaker TTS
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
20
1
0
03 Aug 2023
Comparing normalizing flows and diffusion models for prosody and
  acoustic modelling in text-to-speech
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Guangyan Zhang
Thomas Merritt
M. Ribeiro
Biel Tura Vecino
K. Yanagisawa
...
Ammar Abbas
Piotr Bilinski
Roberto Barra-Chicote
Daniel Korzekwa
Jaime Lorenzo-Trueba
DiffM
36
3
0
31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive
  Speech Synthesis with Prosody Conditional Adversarial Training
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
20
14
0
31 Jul 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context
  Information for Expressive Speech Synthesis
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
30
7
0
29 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
25
2
0
20 Jul 2023
An analysis on the effects of speaker embedding choice in non
  auto-regressive TTS
An analysis on the effects of speaker embedding choice in non auto-regressive TTS
Adriana Stan
Johannah O'Mahony
39
0
0
19 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
47
5
0
11 Jul 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
28
265
0
23 Jun 2023
Investigating the Utility of Surprisal from Large Language Models for
  Speech Synthesis Prosody
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Sofoklis Kakouros
J. Šimko
M. Vainio
Antti Suni
18
5
0
16 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and
  Pause-based Prosody Modeling
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
24
4
0
13 Jun 2023
SpellMapper: A non-autoregressive neural spellchecker for ASR
  customization with candidate retrieval based on n-gram mappings
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings
Alexandra Antonova
Evelina Bakhturina
Boris Ginsburg
KELM
17
6
0
04 Jun 2023
Automatic Evaluation of Turn-taking Cues in Conversational Speech
  Synthesis
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Erik Ekstedt
Siyang Wang
Éva Székely
Joakim Gustafson
Gabriel Skantze
21
6
0
29 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of
  Speech in Glow-TTS
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
35
4
0
28 May 2023
Diverse and Expressive Speech Prosody Prediction with Denoising
  Diffusion Probabilistic Model
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Xiang Li
Songxiang Liu
Max W. Y. Lam
Zhiyong Wu
Chao Weng
Helen Meng
DiffM
21
5
0
26 May 2023
EfficientSpeech: An On-Device Text to Speech Model
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
31
4
0
23 May 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End
  Speech Model
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
Jianzong Wang
Xulong Zhang
Haobin Tang
Aolan Sun
Ning Cheng
Jing Xiao
23
1
0
23 Apr 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
34
7
0
06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read
  and Spontaneous TTS
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
38
4
0
05 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
19
8
0
01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for
  Cross-lingual Speech Synthesis
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
30
8
0
28 Feb 2023
Text-only domain adaptation for end-to-end ASR using integrated
  text-to-mel-spectrogram generator
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
36
14
0
27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using
  Variance Information via Normalizing Flow
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
28
6
0
27 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
17
3
0
24 Feb 2023
Modular Deep Learning
Modular Deep Learning
Jonas Pfeiffer
Sebastian Ruder
Ivan Vulić
E. Ponti
MoMe
OOD
32
73
0
22 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly
  Disentangled Self-supervised Speech Representations
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
13
21
0
16 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Francesco Ferroni
Bryan Catanzaro
16
6
0
24 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a
  Case Study
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
22
10
0
22 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for
  Synthetic Emotional Speech Augmentation
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation
Abdullah Shahid
S. Latif
Junaid Qadir
31
23
0
10 Jan 2023
RWEN-TTS: Relation-aware Word Encoding Network for Natural
  Text-to-Speech Synthesis
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh
HyeongRae Noh
Yoonseok Hong
Insoo Oh
20
0
0
15 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level
  prosodic representations
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
17
7
0
29 Nov 2022
Evaluating and reducing the distance between synthetic and real speech
  distributions
Evaluating and reducing the distance between synthetic and real speech distributions
Christoph Minixhofer
Ondˇrej Klejch
P. Bell
36
7
0
29 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
38
18
0
17 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech
  Representation using Differentiable Digital Signal Processing
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
J. Webber
Cassia Valentini-Botinhao
Evelyn Williams
G. Henter
Simon King
11
9
0
13 Nov 2022
Previous
1234
Next