Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.06873
Cited By
FastPitch: Parallel Text-to-speech with Pitch Prediction
11 June 2020
Adrian Lañcucki
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FastPitch: Parallel Text-to-speech with Pitch Prediction"
50 / 173 papers shown
Title
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
23
2
0
26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
24
6
0
26 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
47
0
0
23 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
18
3
0
14 Oct 2023
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
12
3
0
12 Oct 2023
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
14
0
0
10 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
24
0
0
08 Oct 2023
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
Alexandra Antonova
43
0
0
29 Sep 2023
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
X. Wei
Jia Jia
Xiang Li
Zhiyong Wu
Ziyi Wang
18
0
0
21 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
38
4
0
18 Sep 2023
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs
Md Awsafur Rahman
Bishmoy Paul
Najibul Haque Sarker
Zaber Ibn Abdul Hakim
S. Fattah
Mohammad Saquib
6
3
0
15 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Yong Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
21
2
0
08 Sep 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
20
1
0
30 Aug 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
30
5
0
29 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
33
38
0
24 Aug 2023
Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection
Cunhang Fan
Jun Xue
J. Tao
Jiangyan Yi
Chenglong Wang
C. Zheng
Zhao Lv
28
8
0
19 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
20
1
0
03 Aug 2023
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Guangyan Zhang
Thomas Merritt
M. Ribeiro
Biel Tura Vecino
K. Yanagisawa
...
Ammar Abbas
Piotr Bilinski
Roberto Barra-Chicote
Daniel Korzekwa
Jaime Lorenzo-Trueba
DiffM
36
3
0
31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
20
14
0
31 Jul 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
30
7
0
29 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
25
2
0
20 Jul 2023
An analysis on the effects of speaker embedding choice in non auto-regressive TTS
Adriana Stan
Johannah O'Mahony
39
0
0
19 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
47
5
0
11 Jul 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
28
265
0
23 Jun 2023
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Sofoklis Kakouros
J. Šimko
M. Vainio
Antti Suni
18
5
0
16 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
24
4
0
13 Jun 2023
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings
Alexandra Antonova
Evelina Bakhturina
Boris Ginsburg
KELM
17
6
0
04 Jun 2023
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Erik Ekstedt
Siyang Wang
Éva Székely
Joakim Gustafson
Gabriel Skantze
21
6
0
29 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
35
4
0
28 May 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Xiang Li
Songxiang Liu
Max W. Y. Lam
Zhiyong Wu
Chao Weng
Helen Meng
DiffM
21
5
0
26 May 2023
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
31
4
0
23 May 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
Jianzong Wang
Xulong Zhang
Haobin Tang
Aolan Sun
Ning Cheng
Jing Xiao
23
1
0
23 Apr 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
34
7
0
06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
38
4
0
05 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
19
8
0
01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
30
8
0
28 Feb 2023
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
36
14
0
27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
28
6
0
27 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
17
3
0
24 Feb 2023
Modular Deep Learning
Jonas Pfeiffer
Sebastian Ruder
Ivan Vulić
E. Ponti
MoMe
OOD
32
73
0
22 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
13
21
0
16 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Francesco Ferroni
Bryan Catanzaro
16
6
0
24 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
22
10
0
22 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation
Abdullah Shahid
S. Latif
Junaid Qadir
31
23
0
10 Jan 2023
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh
HyeongRae Noh
Yoonseok Hong
Insoo Oh
20
0
0
15 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
17
7
0
29 Nov 2022
Evaluating and reducing the distance between synthetic and real speech distributions
Christoph Minixhofer
Ondˇrej Klejch
P. Bell
36
7
0
29 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
38
18
0
17 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
J. Webber
Cassia Valentini-Botinhao
Evelyn Williams
G. Henter
Simon King
11
9
0
13 Nov 2022
Previous
1
2
3
4
Next