ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhehuai Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXivPDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 545 papers shown
Title
A General Framework for Learning Procedural Audio Models of
  Environmental Sounds
A General Framework for Learning Procedural Audio Models of Environmental Sounds
Danzel Serrano
M. Cartwright
DiffM
DRL
35
1
0
04 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised
  Speech and Text Representations
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
39
22
0
03 Mar 2023
Speaker-Aware Anti-Spoofing
Speaker-Aware Anti-Spoofing
Xuechen Liu
Md. Sahidullah
Kong Aik Lee
Tomi Kinnunen
32
3
0
02 Mar 2023
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank
  Inter- And Intra-Class Emotion Intensities
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities
Shijun Wang
Jón Guðnason
Damian Borth
39
8
0
02 Mar 2023
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
  Benchmark for Speech Understanding
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
Yingting Li
Ambuj Mehrish
Shuaijiang Zhao
Rishabh Bhardwaj
Amir Zadeh
Navonil Majumder
Rada Mihalcea
Soujanya Poria
AAML
29
16
0
02 Mar 2023
DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation
  Detection and Correction
DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction
R. Anantha
Kriti Bhasin
Daniela Aguilar
Prabal Vashisht
Becci Williamson
Srinivas Chappidi
24
0
0
01 Mar 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
16
6
0
28 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using
  Variance Information via Normalizing Flow
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
28
6
0
27 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
22
3
0
24 Feb 2023
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech
  synthesis in Indian languages
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages
Sudhanshu Srivastava
Ishika Gupta
Anusha Prakash
Jom Kuriakose
H. Murthy
VLM
21
1
0
13 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal
  Supervision
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Eugene Kharitonov
Damien Vincent
Zalan Borsos
Raphaël Marinier
Sertan Girgin
Olivier Pietquin
Matthew Sharifi
Marco Tagliasacchi
Neil Zeghidour
24
191
0
07 Feb 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
32
18
0
30 Jan 2023
On Batching Variable Size Inputs for Training End-to-End Speech
  Enhancement Systems
On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems
Philippe Gonzalez
T. S. Alstrøm
Tobias May
24
9
0
25 Jan 2023
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Kinyugo Maina
32
5
0
16 Jan 2023
Modelling low-resource accents without accent-specific TTS frontend
Modelling low-resource accents without accent-specific TTS frontend
Georgi Tinchev
Marta Czarnowska
Kamil Deja
K. Yanagisawa
Marius Cotescu
31
4
0
11 Jan 2023
Introducing Model Inversion Attacks on Automatic Speaker Recognition
Introducing Model Inversion Attacks on Automatic Speaker Recognition
Karla Pizzi
Franziska Boenisch
U. Sahin
Konstantin Böttinger
33
3
0
09 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
48
654
0
05 Jan 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
24
18
0
29 Dec 2022
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention
  Mechanism
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
16
2
0
28 Dec 2022
Investigation of Japanese PnG BERT language model in text-to-speech
  synthesis for pitch accent language
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Yusuke Yasuda
T. Toda
33
8
0
16 Dec 2022
RWEN-TTS: Relation-aware Word Encoding Network for Natural
  Text-to-Speech Synthesis
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh
HyeongRae Noh
Yoonseok Hong
Insoo Oh
20
0
0
15 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
  Dataset
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
33
1
0
11 Dec 2022
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
Rishabh Dabral
Muhammad Hamza Mughal
Vladislav Golyanik
Christian Theobalt
DiffM
VGen
37
171
0
08 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
37
15
0
08 Dec 2022
GreenEyes: An Air Quality Evaluating Model based on WaveNet
GreenEyes: An Air Quality Evaluating Model based on WaveNet
Kan Huang
Kai Zhang
Ming-de Liu
17
2
0
08 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
80
26
0
08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and
  Transfer Learning
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Ankur Debnath
Shridevi S Patil
Gangotri Nadiger
R. Ganesan
32
20
0
07 Dec 2022
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Fengyu Yang
Jian Luan
Yujun Wang
21
1
0
07 Dec 2022
Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Daxin Tan
Nikos Kargas
David McHardy
C. Papayiannis
Antonio Bonafonte
Marek Střelec
Jonas Rohnke
A. Filandras
Trevor Wood
8
0
0
07 Dec 2022
Learning the joint distribution of two sequences using little or no
  paired data
Learning the joint distribution of two sequences using little or no paired data
Soroosh Mariooryad
Matt Shannon
Siyuan Ma
Tom Bagby
David Kao
Daisy Stanton
Eric Battenberg
RJ Skerry-Ryan
30
2
0
06 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice
  Synthesis
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Qicong Xie
Jixun Yao
Linfu Xie
Dan Su
DiffM
21
8
0
03 Dec 2022
Neural Speech Phase Prediction based on Parallel Estimation Architecture
  and Anti-Wrapping Losses
Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses
Yang Ai
Zhenhua Ling
21
24
0
29 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and
  Opportunities
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
45
2
0
26 Nov 2022
Contextual Expressive Text-to-Speech
Contextual Expressive Text-to-Speech
Jianhong Tu
Zeyu Cui
Xiaohuan Zhou
Siqi Zheng
Kaiqin Hu
Ju Fan
Chang Zhou
22
2
0
26 Nov 2022
Puffin: pitch-synchronous neural waveform generation for fullband speech
  on modest devices
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
O. Watts
Lovisa Wihlborg
Cassia Valentini-Botinhao
38
3
0
25 Nov 2022
Efficient Incremental Text-to-Speech on GPUs
Efficient Incremental Text-to-Speech on GPUs
Muyang Du
Chuan Liu
Jiaxing Qi
Junjie Lai
24
1
0
25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural
  MIDI-to-Audio Synthesis Systems?
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Xuan Shi
Erica Cooper
Xin Wang
Junichi Yamagishi
Shrikanth Narayanan
27
1
0
25 Nov 2022
Prosody-controllable spontaneous TTS with neural HMMs
Prosody-controllable spontaneous TTS with neural HMMs
Harm Lameris
Shivam Mehta
G. Henter
Joakim Gustafson
Éva Székely
46
15
0
24 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
43
18
0
17 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with
  Multi-factor Constraints
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang
Xinsheng Wang
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
30
5
0
16 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech
  Representation using Differentiable Digital Signal Processing
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
J. Webber
Cassia Valentini-Botinhao
Evelyn Williams
G. Henter
Simon King
16
9
0
13 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
39
12
0
13 Nov 2022
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant
  Instance Conditioning
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning
Gaku Narita
Junichi Shimizu
Taketo Akama
GAN
31
11
0
10 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational
  Autoencoder
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
28
6
0
07 Nov 2022
Deliberation Networks and How to Train Them
Deliberation Networks and How to Train Them
Qingyun Dou
Mark Gales
24
0
0
06 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems
  via Vowel Space
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee
Jaesung Bae
Seongkyu Mun
Heejin Choi
Joun Yeop Lee
Hoon-Young Cho
Chanwoo Kim
32
2
0
06 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for
  Noise-robust Expressive TTS
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang
Songxiang Liu
Jianwei Yu
Helin Wang
Chao Weng
Yuexian Zou
DiffM
VLM
43
18
0
04 Nov 2022
SpectroMap: Peak detection algorithm for audio fingerprinting
SpectroMap: Peak detection algorithm for audio fingerprinting
A. López-García
38
0
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic
  latents prediction for Expressive Speech Synthesis
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
19
0
0
01 Nov 2022
Waveform Boundary Detection for Partially Spoofed Audio
Waveform Boundary Detection for Partially Spoofed Audio
Zexin Cai
Weiqing Wang
Ming Li
24
25
0
01 Nov 2022
Previous
123456...91011
Next