ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.07217
  4. Cited By
Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

16 October 2018
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Yuxuan Wang
Yuan Cao
Ye Jia
Zhehuai Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
    BDL
ArXivPDFHTML

Papers citing "Hierarchical Generative Modeling for Controllable Speech Synthesis"

43 / 93 papers shown
Title
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
25
160
0
06 Jun 2021
Learning Robust Latent Representations for Controllable Speech Synthesis
Learning Robust Latent Representations for Controllable Speech Synthesis
Shakti Kumar
Jithin Pradeep
Hussain Zaidi
DRL
41
6
0
10 May 2021
Disentangled Sequence Clustering for Human Intention Inference
Disentangled Sequence Clustering for Human Intention Inference
Mark Zolotas
Y. Demiris
DRL
24
5
0
23 Jan 2021
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
Chen Zhang
Yi Ren
Xu Tan
Jinglin Liu
Ke-jun Zhang
Tao Qin
Sheng Zhao
Tie-Yan Liu
DiffM
39
37
0
17 Dec 2020
Measuring Disentanglement: A Review of Metrics
Measuring Disentanglement: A Review of Metrics
M. Carbonneau
Julian Zaïdi
Jonathan Boilard
G. Gagnon
CoGe
DRL
28
81
0
16 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
29
5
0
14 Dec 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
30
11
0
24 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
C. Chien
Hung-yi Lee
29
36
0
12 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech
  Synthesis via Phone-Level Content-Style Disentanglement
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
29
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
24
98
0
06 Nov 2020
Unsupervised Learning of Disentangled Speech Content and Style
  Representation
Unsupervised Learning of Disentangled Speech Content and Style Representation
Andros Tjandra
Ruoming Pang
Yu Zhang
Shigeki Karita
BDL
DRL
23
15
0
24 Oct 2020
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech
  Synthesis
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis
Rui Liu
Berrak Sisman
Haizhou Li
18
24
0
23 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
27
102
0
22 Oct 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge
  2020: Cascading ASR and TTS
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Wen-Chin Huang
Tomoki Hayashi
Shinji Watanabe
T. Toda
DRL
13
39
0
06 Oct 2020
Neural voice cloning with a few low-quality samples
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
33
2
0
12 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
23
109
0
08 Jun 2020
Pitchtron: Towards audiobook generation from ordinary people's voices
Pitchtron: Towards audiobook generation from ordinary people's voices
Sunghee Jung
Hoi-Rim Kim
16
5
0
21 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based
  Variable-Length Embedding
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
37
65
0
18 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
16
61
0
14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for
  Text-to-Speech Synthesis
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
21
119
0
12 May 2020
Jukebox: A Generative Model for Music
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
52
722
0
30 Apr 2020
The Attacker's Perspective on Automatic Speaker Verification: An
  Overview
The Attacker's Perspective on Automatic Speaker Verification: An Overview
Rohan Kumar Das
Xiaohai Tian
Tomi Kinnunen
Haizhou Li
AAML
20
81
0
19 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual
  Information for Speech Synthesis
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
Ting-Yao Hu
A. Shrivastava
Oncel Tuzel
C. Dhir
11
30
0
09 Mar 2020
Deterministic Decoding for Discrete Data in Variational Autoencoders
Deterministic Decoding for Discrete Data in Variational Autoencoders
Daniil Polykovskiy
Dmitry Vetrov
OffRL
21
8
0
04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
16
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
36
92
0
06 Feb 2020
Unsupervised Representation Disentanglement using Cross Domain Features
  and Adversarial Learning in Variational Autoencoder based Voice Conversion
Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion
Wen-Chin Huang
Hao Luo
Hsin-Te Hwang
Chen-Chou Lo
Yu-Huai Peng
Yu Tsao
Hsin-Min Wang
DRL
17
42
0
22 Jan 2020
Deep Representation Learning in Speech Processing: Challenges, Recent
  Advances, and Future Trends
Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends
S. Latif
R. Rana
Sara Khalifa
Raja Jurdak
Junaid Qadir
Björn W. Schuller
AI4TS
34
81
0
02 Jan 2020
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven
  Acoustic Embedding Selection
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection
Shubhi Tyagi
M. Nicolis
Jonas Rohnke
Thomas Drugman
Jaime Lorenzo-Trueba
32
32
0
02 Dec 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source
  End-to-End Text-to-Speech Toolkit
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
T. Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
29
202
0
24 Oct 2019
Speech Recognition with Augmented Synthesized Speech
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
32
127
0
25 Sep 2019
Sequence to Sequence Neural Speech Synthesis with Prosody Modification
  Capabilities
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities
Slava Shechtman
A. Sorin
16
33
0
23 Sep 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic
  Speech -- a Deep Learning approach
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
Noé Tits
16
10
0
05 Jul 2019
Learning Disentangled Representations of Timbre and Pitch for Musical
  Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Yin-Jyun Luo
Kat R. Agres
Dorien Herremans
22
46
0
19 Jun 2019
Using generative modelling to produce varied intonation for speech
  synthesis
Using generative modelling to produce varied intonation for speech synthesis
Zack Hodari
O. Watts
Simon King
29
29
0
10 Jun 2019
Non-Autoregressive Neural Text-to-Speech
Non-Autoregressive Neural Text-to-Speech
Kainan Peng
Ming-Yu Liu
Z. Song
Kexin Zhao
29
39
0
21 May 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven
  Dynamic Hierarchical Conditional Variational Network
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
V. Wan
Chun-an Chan
Tom Kenter
Jakub Vít
R. Clark
24
75
0
17 May 2019
Direct speech-to-speech translation with a sequence-to-sequence model
Direct speech-to-speech translation with a sequence-to-sequence model
Ye Jia
Ron J. Weiss
Fadi Biadsy
Wolfgang Macherey
Melvin Johnson
Zhehuai Chen
Yonghui Wu
21
223
0
12 Apr 2019
Multi-reference Tacotron by Intercross Training for Style
  Disentangling,Transfer and Control in Speech Synthesis
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Yanyao Bian
Changbin Chen
Yongguo Kang
Zhenglin Pan
18
46
0
04 Apr 2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence
  Modeling
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Jonathan Shen
Patrick Nguyen
Yonghui Wu
Zhehuai Chen
Mengzhao Chen
...
William Chan
Shubham Toshniwal
Baohua Liao
M. Nirschl
Pat Rondon
VLM
27
209
0
21 Feb 2019
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text
  Translation
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia
Melvin Johnson
Wolfgang Macherey
Ron J. Weiss
Yuan Cao
Chung-Cheng Chiu
Naveen Ari
Stella Laurenzo
Yonghui Wu
31
159
0
05 Nov 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhehuai Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
207
820
0
12 Jun 2018
Listening while Speaking: Speech Chain by Deep Learning
Listening while Speaking: Speech Chain by Deep Learning
Andros Tjandra
S. Sakti
Satoshi Nakamura
AuLLM
126
165
0
16 Jul 2017
Previous
12