ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
v1v2 (latest)

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 1,276 papers shown
Title
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
89
33
0
31 Mar 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme
  Representations for Text to Speech
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang
Kaitao Song
Xu Tan
Daxin Tan
Yuzi Yan
...
G. Wang
Wei Zhou
Tao Qin
Tan Lee
Sheng Zhao
SSL
95
21
0
31 Mar 2022
SingAug: Data Augmentation for Singing Voice Synthesis with
  Cycle-consistent Training Strategy
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy
Shuai Guo
Jiatong Shi
Tao Qian
Shinji Watanabe
Qin Jin
137
13
0
31 Mar 2022
WavThruVec: Latent speech representation as intermediate features for
  neural speech synthesis
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak
Piotr Dura
Pol van Rijn
Nori Jacoby
AI4TS
140
30
0
31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to
  Speech
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
D. Lim
Sunghee Jung
Eesung Kim
95
53
0
31 Mar 2022
NeuFA: Neural Network Based End-to-End Forced Alignment with
  Bidirectional Attention Mechanism
NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism
Jingbei Li
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
45
21
0
31 Mar 2022
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
  Adaptive Noise Spectral Shaping
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping
Yuma Koizumi
Heiga Zen
Kohei Yatabe
Nanxin Chen
M. Bacchiani
DiffM
103
49
0
31 Mar 2022
Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention
  VAE
Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE
Ziang Long
Yunling Zheng
Meng Yu
Jack Xin
DRL
63
5
0
30 Mar 2022
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech
  Recognition
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Junrui Ni
Liming Wang
Heting Gao
Kaizhi Qian
Yang Zhang
Shiyu Chang
M. Hasegawa-Johnson
78
25
0
29 Mar 2022
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level
  and Utterance-Level Acoustic Representation Learning
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning
Takaaki Saeki
Kentaro Tachibana
Ryuichi Yamamoto
60
11
0
29 Mar 2022
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise
  Distillation
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation
Rendi Chevi
Radityo Eko Prasojo
Alham Fikri Aji
Andros Tjandra
S. Sakti
VLM
60
4
0
29 Mar 2022
ASR data augmentation in low-resource settings using cross-lingual
  multi-speaker TTS and cross-lingual voice conversion
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Edresson Casanova
C. Shulby
Alexander Korolev
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
153
14
0
29 Mar 2022
Transfer Learning Framework for Low-Resource Text-to-Speech using a
  Large-Scale Unlabeled Speech Corpus
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Sunghwan Ahn
Joun Yeop Lee
N. Kim
113
26
0
29 Mar 2022
Applying Syntax$\unicode{x2013}$Prosody Mapping Hypothesis and Prosodic
  Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis
Applying Syntax\unicodex2013\unicode{x2013}\unicodex2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis
Kei Furukawa
Takeshi Kishiyama
Satoshi Nakamura
23
1
0
29 Mar 2022
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly
  Voice Agent
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
Yuki Saito
Yuto Nishimura
Shinnosuke Takamichi
Kentaro Tachibana
Hiroshi Saruwatari
128
12
0
28 Mar 2022
vTTS: visual-text to speech
vTTS: visual-text to speech
Yoshifumi Nakano
Takaaki Saeki
Shinnosuke Takamichi
Katsuhito Sudoh
Hiroshi Saruwatari
61
4
0
28 Mar 2022
Attacker Attribution of Audio Deepfakes
Attacker Attribution of Audio Deepfakes
Nicolas Müller
Franziska Dieckmann
Jennifer Williams
60
15
0
28 Mar 2022
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud
  to Edge
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge
Sangjun Park
Kihyun Choo
Joohyung Lee
A. Porov
Konstantin Osipov
June Sig Sung
72
6
0
27 Mar 2022
A Neural Vocoder Based Packet Loss Concealment Algorithm
A Neural Vocoder Based Packet Loss Concealment Algorithm
Yaofeng Zhou
C. Bao
64
2
0
26 Mar 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid
  ASR Bottleneck Features for Voice Conversion
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Xintao Zhao
Feng Liu
Changhe Song
Zhiyong Wu
Shiyin Kang
Deyi Tuo
Helen Meng
85
21
0
24 Mar 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context
  Information for Mandarin Speech Synthesis
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
60
12
0
23 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable
  Duration Modeling
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
34
2
0
21 Mar 2022
Vocal effort modeling in neural TTS for improving the intelligibility of
  synthetic speech in noise
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise
T. Raitio
Petko N. Petkov
Jiangchuan Li
M. Shifas
Andrea Davis
Y. Stylianou
48
2
0
20 Mar 2022
ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Jinlong Xue
Yayue Deng
Yichen Han
Ya Li
Jianqing Sun
Jiaen Liang
58
8
0
20 Mar 2022
AdaVocoder: Adaptive Vocoder for Custom Voice
AdaVocoder: Adaptive Vocoder for Custom Voice
Xin Yuan
Yongbin Feng
Mingming Ye
Cheng Tuo
Minghang Zhang
133
3
0
18 Mar 2022
Improve few-shot voice cloning using multi-modal learning
Improve few-shot voice cloning using multi-modal learning
Haitong Zhang
Yue Lin
51
8
0
18 Mar 2022
A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech
  Synthesis and Editing
A3^33T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Richard He Bai
Renjie Zheng
Junkun Chen
Xintong Li
Mingbo Ma
Liang Huang
121
53
0
18 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising
  flows
Text-free non-parallel many-to-many voice conversion using normalising flows
Thomas Merritt
Abdelhamid Ezzerg
Piotr Bilinski
Magdalena Proszewska
Kamil Pokora
Roberto Barra-Chicote
Daniel Korzekwa
124
15
0
15 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
  for Semantic and Generative Capabilities
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
93
110
0
14 Mar 2022
Are discrete units necessary for Spoken Language Modeling?
Are discrete units necessary for Spoken Language Modeling?
Tu Nguyen
Benoît Sagot
Emmanuel Dupoux
108
26
0
11 Mar 2022
Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken
  Conversations
Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations
Ruijie Yan
Shuang Peng
Haitao Mi
Liang Jiang
Shihui Yang
Yuchi Zhang
Jiajun Li
Liangrui Peng
Yongliang Wang
Zujie Wen
44
4
0
08 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with
  Articulatory Features
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
102
29
0
07 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning
Variational Auto-Encoder based Mandarin Speech Cloning
Qingyu Xing
Xiaohan Ma
138
0
0
06 Mar 2022
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
  Excitation for Noise-Controllable Waveform Generation
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation
Tao Wang
Ruibo Fu
Jiangyan Yi
J. Tao
Zhengqi Wen
28
2
0
05 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating
  Inverse Short-Time Fourier Transform
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Takuhiro Kaneko
Kou Tanaka
Hirokazu Kameoka
Shogo Seki
89
62
0
04 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
104
109
0
02 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based
  Non-Autoregressive TTS
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Haohan Guo
Hui Lu
Xixin Wu
Helen Meng
358
7
0
02 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical
  Parametric Speech Synthesis
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis
Pengyu Cheng
Zhenhua Ling
79
3
0
02 Mar 2022
Real time spectrogram inversion on mobile phone
Real time spectrogram inversion on mobile phone
Oleg Rybakov
Marco Tagliasacchi
Yunpeng Li
Liyang Jiang
Xia Zhang
Fadi Biadsy
146
4
0
01 Mar 2022
Revisiting Over-Smoothness in Text to Speech
Revisiting Over-Smoothness in Text to Speech
Yi Ren
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
148
64
0
26 Feb 2022
Neural Speech Synthesis on a Shoestring: Improving the Efficiency of
  LPCNet
Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet
J. Valin
Umut Isik
Paris Smaragdis
A. Krishnaswamy
62
4
0
22 Feb 2022
Wavebender GAN: An architecture for phonetically meaningful speech
  manipulation
Wavebender GAN: An architecture for phonetically meaningful speech manipulation
Gustavo Teodoro Döhler Beck
Ulme Wennberg
Zofia Malisz
G. Henter
AI4CE
94
8
0
22 Feb 2022
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Jianhao Ye
Hongbin Zhou
Zhiba Su
Wendi He
Kaimeng Ren
Lin Li
Heng Lu
50
4
0
22 Feb 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech
  Editing
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing
Tao Wang
Jiangyan Yi
Ruibo Fu
J. Tao
Zhengqi Wen
KELM
79
20
0
21 Feb 2022
It's Raw! Audio Generation with State-Space Models
It's Raw! Audio Generation with State-Space Models
Karan Goel
Albert Gu
Chris Donahue
Christopher Ré
113
195
0
20 Feb 2022
A Review on Methods and Applications in Multimodal Deep Learning
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLMHAI
75
103
0
18 Feb 2022
ADD 2022: the First Audio Deep Synthesis Detection Challenge
ADD 2022: the First Audio Deep Synthesis Detection Challenge
Jiangyan Yi
Ruibo Fu
J. Tao
Shuai Nie
Haoxin Ma
...
Le Xu
Zhengqi Wen
Haizhou Li
Zheng Lian
Bin Liu
79
185
0
17 Feb 2022
Singing-Tacotron: Global duration control attention and dynamic filter
  for End-to-end singing voice synthesis
Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis
Tao Wang
Ruibo Fu
Jiangyan Yi
J. Tao
Zhengqi Wen
49
7
0
16 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in
  Text-to-Speech
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech
Yi Ren
Ming Lei
Zhiying Huang
Shi-Rui Zhang
Qian Chen
Zhijie Yan
Zhou Zhao
96
43
0
16 Feb 2022
textless-lib: a Library for Textless Spoken Language Processing
textless-lib: a Library for Textless Spoken Language Processing
Eugene Kharitonov
Jade Copet
Kushal Lakhotia
Tu Nguyen
Paden Tomasello
...
A. Elkahky
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
129
34
0
15 Feb 2022
Previous
123...121314...242526
Next