ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhehuai Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXivPDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 545 papers shown
Title
Prosody Analysis of Audiobooks
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
32
1
0
10 Oct 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech Generation
Yong Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
26
2
0
08 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge
  2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
23
2
0
06 Sep 2023
Voice Morphing: Two Identities in One Voice
Voice Morphing: Two Identities in One Voice
Sushant Pani
Anurag Chowdhury
Morgan Sandler
Arun Ross
32
1
0
05 Sep 2023
A Comparative Analysis of Pretrained Language Models for Text-to-Speech
A Comparative Analysis of Pretrained Language Models for Text-to-Speech
M. G. Moya
Panagiota Karanasou
S. Karlapati
Bastian Schnell
Nicole Peinelt
Alexis Moinet
Thomas Drugman
39
3
0
04 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for
  Text-to-Speech -- A Study between English and Mandarin
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
57
8
0
02 Sep 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive
  Text-to-Speech Synthesis
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Yi Meng
Xiang Li
Zhiyong Wu
Tingtian Li
Zixun Sun
Xinyu Xiao
Chi Sun
Hui Zhan
Helen Meng
26
0
0
30 Aug 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
25
1
0
30 Aug 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon
Changhwan Kim
Eunwoo Song
Hyun-Wook Yoon
Hong-Goo Kang
42
1
0
28 Aug 2023
The DKU-DUKEECE System for the Manipulation Region Location Task of ADD
  2023
The DKU-DUKEECE System for the Manipulation Region Location Task of ADD 2023
Zexin Cai
Weiqing Wang
Yikang Wang
Ming Li
30
6
0
20 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual
  Discriminators for High-Fidelity Multi-Speaker TTS
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
28
1
0
03 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
38
1
0
31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech
  with Adversarial Learning and Architecture Design
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
37
37
0
31 Jul 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context
  Information for Expressive Speech Synthesis
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
37
7
0
29 Jul 2023
A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check
A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check
Xunjian Yin
Xiao-Yi Wan
ELM
12
3
0
25 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
29
5
0
03 Jul 2023
SysNoise: Exploring and Benchmarking Training-Deployment System
  Inconsistency
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency
Yan Wang
Yuhang Li
Ruihao Gong
Aishan Liu
Yanfei Wang
...
Yongqiang Yao
Yunchen Zhang
Tianzi Xiao
F. Yu
Xianglong Liu
AAML
37
0
0
01 Jul 2023
The Singing Voice Conversion Challenge 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
T. Toda
29
48
0
26 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
34
6
0
25 Jun 2023
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
K. Lakshminarayana
C. Dittmar
N. Pia
Emanuel Habets
34
0
0
16 Jun 2023
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps
  and Residual Convolutional Neural Networks
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks
K. Paim
Ricardo Rohweder
M. R. Mendoza
R. Mansilha
Weverton Cordeiro
33
2
0
16 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural
  vocoders for high-quality audio synthesis
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
37
79
0
01 Jun 2023
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer
  Learning for Low-Resource Text-to-Speech
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech
P. Do
Matt Coler
J. Dijkstra
E. Klabbers
OffRL
29
0
0
01 Jun 2023
Text-to-Speech Pipeline for Swiss German -- A comparison
Text-to-Speech Pipeline for Swiss German -- A comparison
Tobias Bollinger
Jan Deriu
Manfred Vogel
DiffM
26
0
0
31 May 2023
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations
  for Text-to-Speech
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
L. T. Nguyen
Thinh-Le-Gia Pham
Dat Quoc Nguyen
34
13
0
31 May 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
48
66
0
30 May 2023
Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced
  Driver-Assistance System
Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance System
Jiwei Guan
Lei Pan
Chen Wang
Shui Yu
Longxiang Gao
Xi Zheng
AAML
24
3
0
30 May 2023
Automatic Evaluation of Turn-taking Cues in Conversational Speech
  Synthesis
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Erik Ekstedt
Siyang Wang
Éva Székely
Joakim Gustafson
Gabriel Skantze
28
6
0
29 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of
  Speech in Glow-TTS
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
40
4
0
28 May 2023
Multilingual Text-to-Speech Synthesis for Turkic Languages Using
  Transliteration
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
Rustem Yeshpanov
Saida Mussakhojayeva
Yerbolat Khassanov
21
2
0
25 May 2023
Spoken Question Answering and Speech Continuation Using
  Spectrogram-Powered LLM
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani
Alon Levkovitch
Roy Hirsch
Julián Salazar
Chulayutsh Asawaroengchai
Soroosh Mariooryad
Ehud Rivlin
RJ Skerry-Ryan
Michelle Tadmor Ramanovich
AuLLM
36
34
0
24 May 2023
EfficientSpeech: An On-Device Text to Speech Model
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
36
4
0
23 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
41
10
0
22 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
49
54
0
22 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net
  Encoder With Multiple STFTs
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Won Jang
D. Lim
Heayoung Park
32
1
0
18 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction
  of Amplitude and Phase Spectra
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
39
13
0
13 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
35
0
0
12 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
39
1
0
09 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with
  Bidirectional Attention Mechanism for Automatic Dubbing
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Jingbei Li
Sipan Li
Ping Chen
Lu Zhang
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
40
3
0
09 May 2023
Can deepfakes be created by novice users?
Can deepfakes be created by novice users?
Pulak Mehta
Gauri Jagatap
Kevin Gallagher
Brian Timmerman
Progga Deb
S. Garg
Rachel Greenstadt
Brendan Dolan-Gavitt
HAI
32
3
0
28 Apr 2023
TorchBench: Benchmarking PyTorch with High API Surface Coverage
TorchBench: Benchmarking PyTorch with High API Surface Coverage
Yueming Hao
Xu Zhao
Bin Bao
David Berard
William Constable
Adnan Aziz
Xu Liu
38
5
0
27 Apr 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High
  Fidelity Speech Synthesis
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
Ye-Xin Lu
Yang Ai
Zhenhua Ling
24
1
0
26 Apr 2023
OLISIA: a Cascade System for Spoken Dialogue State Tracking
OLISIA: a Cascade System for Spoken Dialogue State Tracking
Léo Jacqmin
Lucas Druart
Yannick Esteve
Benoit Favre
L. Rojas-Barahona
Valentin Vielzeuf
35
3
0
20 Apr 2023
Affective social anthropomorphic intelligent system
Affective social anthropomorphic intelligent system
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
22
1
0
19 Apr 2023
Context-aware Coherent Speaking Style Prediction with Hierarchical
  Transformers for Audiobook Speech Synthesis
Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
40
6
0
13 Apr 2023
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive
  Structured Pruning
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Sung-Feng Huang
Chia-Ping Chen
Zhi-Sheng Chen
Yu-Pao Tsai
Hung-yi Lee
38
3
0
21 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
46
47
0
21 Mar 2023
Evaluating gesture generation in a large-scale open challenge: The GENEA
  Challenge 2022
Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022
Taras Kucherenko
Pieter Wolfert
Youngwoo Yoon
Carla Viegas
Teodor Nikolov
Mihail Tsakov
G. Henter
37
24
0
15 Mar 2023
Do Prosody Transfer Models Transfer Prosody?
Do Prosody Transfer Models Transfer Prosody?
A. Sigurgeirsson
Simon King
DiffM
12
7
0
07 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
34
7
0
06 Mar 2023
Previous
12345...91011
Next