ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.11997
  4. Cited By
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

26 October 2019
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
ArXivPDFHTML

Papers citing "Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens"

50 / 78 papers shown
Title
Text-Driven Voice Conversion via Latent State-Space Modeling
Text-Driven Voice Conversion via Latent State-Space Modeling
Wen Li
Sofia Martinez
Priyanka Shah
53
0
0
26 Mar 2025
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
46
4
0
04 Nov 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
45
1
0
20 Aug 2024
Re-ENACT: Reinforcement Learning for Emotional Speech Generation using
  Actor-Critic Strategy
Re-ENACT: Reinforcement Learning for Emotional Speech Generation using Actor-Critic Strategy
Ravi Shankar
Archana Venkataraman
44
0
0
04 Aug 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling
  on Time Variability
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Hyun Joon Park
Jin Sob Kim
Wooseok Shin
Sung Won Han
DiffM
41
2
0
27 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
58
11
0
25 Jun 2024
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker
  Representations
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Yejin Jeon
Yunsu Kim
Gary Geunbae Lee
40
2
0
04 Jan 2024
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
  Learning
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
28
0
0
07 Nov 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling
  for Zero-Shot Voice Cloning
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
35
3
0
06 Oct 2023
Improving severity preservation of healthy-to-pathological voice
  conversion with global style tokens
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens
B. Halpern
Wen-Chin Huang
Lester Phillip Violeta
R.J.J.H. van Son
T. Toda
35
2
0
04 Oct 2023
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion
  Analysis
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
X. Wei
Jia Jia
Xiang Li
Zhiyong Wu
Ziyi Wang
23
1
0
21 Sep 2023
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice
  Conversion
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion
Yimin Deng
Huaizhen Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
29
7
0
21 Aug 2023
Automatic Evaluation of Turn-taking Cues in Conversational Speech
  Synthesis
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Erik Ekstedt
Siyang Wang
Éva Székely
Joakim Gustafson
Gabriel Skantze
28
6
0
29 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of
  Speech in Glow-TTS
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
35
4
0
28 May 2023
Controllable Speaking Styles Using a Large Language Model
Controllable Speaking Styles Using a Large Language Model
A. Sigurgeirsson
Simon King
25
2
0
17 May 2023
Zero-shot text-to-speech synthesis conditioned using self-supervised
  speech representation model
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Kenichi Fujita
Takanori Ashihara
Hiroki Kanagawa
Takafumi Moriya
Yusuke Ijima
46
10
0
24 Apr 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
24
18
0
29 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice
  Synthesis
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Qicong Xie
Jixun Yao
Linfu Xie
Dan Su
DiffM
21
8
0
03 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level
  prosodic representations
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
25
7
0
29 Nov 2022
Can we use Common Voice to train a Multi-Speaker TTS system?
Can we use Common Voice to train a Multi-Speaker TTS system?
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
27
10
0
12 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
37
14
0
12 Oct 2022
The Role of Vocal Persona in Natural and Synthesized Speech
The Role of Vocal Persona in Natural and Synthesized Speech
Camille Noufi
Lloyd May
J. Berger
27
2
0
06 Sep 2022
Low-data? No problem: low-resource, language-agnostic conversational
  text-to-speech via F0-conditioned data augmentation
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Giulia Comini
Goeric Huybrechts
M. Ribeiro
Adam Gabry's
Jaime Lorenzo-Trueba
35
5
0
29 Jul 2022
Controllable Data Generation by Deep Learning: A Review
Controllable Data Generation by Deep Learning: A Review
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Liang Zhao
33
28
0
19 Jul 2022
NatiQ: An End-to-end Text-to-Speech System for Arabic
NatiQ: An End-to-end Text-to-Speech System for Arabic
Ahmed Abdelali
Nadir Durrani
C. Demiroğlu
Fahim Dalvi
Hamdy Mubarak
Kareem Darwish
23
14
0
15 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
42
38
0
30 May 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain
  Text-to-Speech
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
117
34
0
15 May 2022
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using
  Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Ryo Terashima
Ryuichi Yamamoto
Eunwoo Song
Yuma Shirahata
Hyun-Wook Yoon
Jae-Min Kim
Kentaro Tachibana
11
15
0
21 Apr 2022
Karaoker: Alignment-free singing voice synthesis with speech training
  data
Karaoker: Alignment-free singing voice synthesis with speech training data
Panos Kakoulidis
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
June Sig Sung
Gunu Jho
Pirros Tsiakoulis
Aimilios Chalamandaris
12
3
0
08 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different
  Configurations for Speech Synthesis
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Fan Wang
Po-Chun Hsu
Da-Rong Liu
Hung-yi Lee
18
0
0
01 Apr 2022
SingAug: Data Augmentation for Singing Voice Synthesis with
  Cycle-consistent Training Strategy
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy
Shuai Guo
Jiatong Shi
Tao Qian
Shinji Watanabe
Qin Jin
33
13
0
31 Mar 2022
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker
  SVS by Learning from Singing Teacher
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Heyang Xue
Xinsheng Wang
Yongmao Zhang
Lei Xie
Pengcheng Zhu
Mengxiao Bi
DiffM
33
11
0
30 Mar 2022
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial
  Fine-Tuning Results for Child Speech Synthesis
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Rishabh Jain
Mariam Yiwere
Dan Bigioi
Peter Corcoran
H. Cucu
27
14
0
22 Mar 2022
Generative Modeling for Low Dimensional Speech Attributes with Neural
  Spline Flows
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Kevin J. Shih
Rafael Valle
Rohan Badlani
J. F. Santos
Bryan Catanzaro
36
4
0
03 Mar 2022
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls
  Emotional Intensity
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity
Sungjae Kim
Y.E. Kim
Jewoo Jun
Injung Kim
31
14
0
02 Mar 2022
Cross-speaker style transfer for text-to-speech using data augmentation
Cross-speaker style transfer for text-to-speech using data augmentation
M. Ribeiro
Julian Roth
Giulia Comini
Goeric Huybrechts
Adam Gabry's
Jaime Lorenzo-Trueba
19
21
0
10 Feb 2022
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
23
0
25 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent
  Phoneme-level Prosody Control
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
21
4
0
19 Nov 2021
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
K. Markopoulos
Nikolaos Ellinas
Alexandra Vioni
Myrsini Christidou
Panos Kakoulidis
...
Georgia Maniati
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
Aimilios Chalamandaris
11
2
0
17 Nov 2021
RefineGAN: Universally Generating Waveform Better than Ground Truth with
  Highly Accurate Pitch and Intensity Responses
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses
Shengyuan Xu
Wenxiao Zhao
Jing Guo
24
12
0
01 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation
  Learning
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
29
11
0
27 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning
Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara
Jason Chun Lok Li
Boris Ginsburg
38
15
0
12 Oct 2021
Pitch Preservation In Singing Voice Synthesis
Pitch Preservation In Singing Voice Synthesis
Shujun Liu
Hai Zhu
Kun Wang
Huajun Wang
28
0
0
11 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative
  Sequence Models
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
26
2
0
06 Oct 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech
  synthesis
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
26
42
0
14 Sep 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive
  Speech Synthesis
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis
Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
M. Carbonneau
34
20
0
04 Aug 2021
SurpriseNet: Melody Harmonization Conditioning on User-controlled
  Surprise Contours
SurpriseNet: Melody Harmonization Conditioning on User-controlled Surprise Contours
Yi-Wei Chen
Hung-Shin Lee
Yen-Hsing Chen
Hsin-Min Wang
24
17
0
01 Aug 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal
  Latent Representations
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Seyun Um
Jihyun Kim
Jihyun Lee
Hong-Goo Kang
CVBM
13
4
0
26 Jul 2021
A Deep-Bayesian Framework for Adaptive Speech Duration Modification
A Deep-Bayesian Framework for Adaptive Speech Duration Modification
Ravi Shankar
A. Venkataraman
26
0
0
11 Jul 2021
12
Next