Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1703.10135
Cited By
Tacotron: Towards End-to-End Speech Synthesis
29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tacotron: Towards End-to-End Speech Synthesis"
50 / 817 papers shown
Title
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Erica Cooper
Xin Wang
Junichi Yamagishi
39
6
0
25 Apr 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
Yuzi Yan
Xu Tan
Bohan Li
Tao Qin
Sheng Zhao
Yuan-Chung Shen
Tie-Yan Liu
20
45
0
20 Apr 2021
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset
Saida Mussakhojayeva
Aigerim Janaliyeva
A. Mirzakhmetov
Yerbolat Khassanov
H. A. Varol
17
14
0
17 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
27
8
0
16 Apr 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion
Hirokazu Kameoka
Kou Tanaka
Takuhiro Kaneko
39
21
0
14 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Jingbei Li
Zhiyong Wu
Yanyao Bian
Dan Su
Helen Meng
46
6
0
14 Apr 2021
Non-autoregressive sequence-to-sequence voice conversion
Tomoki Hayashi
Wen-Chin Huang
Kazuhiro Kobayashi
Tomoki Toda
14
23
0
14 Apr 2021
Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Yang Gao
Tyler Vuong
Mahsa Elyasi
Gaurav Bharaj
Rita Singh
26
20
0
08 Apr 2021
Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features
Mahsa Elyasi
Gaurav Bharaj
19
2
0
08 Apr 2021
Phoneme-based Distribution Regularization for Speech Enhancement
Yajing Liu
Xiulian Peng
Zhiwei Xiong
Yan Lu
10
4
0
08 Apr 2021
Half-Truth: A Partially Fake Audio Detection Dataset
Jiangyan Yi
Ye Bai
J. Tao
Haoxin Ma
Zhengkun Tian
Chenglong Wang
Tao Wang
Ruibo Fu
21
82
0
08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li
Changhe Song
Jingbei Li
Zhiyong Wu
Jia Jia
Helen Meng
25
47
0
08 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Myeonghun Jeong
Hyeongju Kim
Sung Jun Cheon
Byoung Jin Choi
N. Kim
DiffM
25
191
0
03 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability
Rui Liu
Berrak Sisman
Haizhou Li
39
32
0
03 Apr 2021
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark Gales
33
7
0
02 Apr 2021
Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Qing He
Zhiping Xiu
T. Koehler
Jilong Wu
24
7
0
01 Apr 2021
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech
M. Kang
Jihyun Lee
Simin Kim
Injung Kim
8
6
0
01 Apr 2021
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training
Kun Zhou
Berrak Sisman
Haizhou Li
28
27
0
31 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
52
81
0
28 Mar 2021
Continual Speaker Adaptation for Text-to-Speech Synthesis
Hamed Hemati
Damian Borth
CLL
27
9
0
26 Mar 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
24
30
0
17 Mar 2021
Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system
Noé Tits
Kevin El Haddad
Thierry Dutoit
24
5
0
06 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
42
188
0
01 Mar 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood
M. Nawaz
K. Malik
A. Javed
Aun Irtaza
AAML
128
299
0
25 Feb 2021
AudioVisual Speech Synthesis: A brief literature review
Efthymios Georgiou
Athanasios Katsamanis
21
0
0
18 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
42
22
0
12 Feb 2021
Onoma-to-wave: Environmental sound synthesis from onomatopoeic words
Yuki Okamoto
Keisuke Imoto
Shinnosuke Takamichi
Ryosuke Yamanishi
Takahiro Fukumori
Y. Yamashita
13
14
0
11 Feb 2021
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Jinzhu Li
Sheng Zhao
Enhong Chen
Tie-Yan Liu
22
58
0
08 Feb 2021
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network
Chenpeng Du
K. Yu
36
17
0
01 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Shilu Lin
Fenglong Xie
Li Meng
Xinhui Li
Li Lu
11
0
0
30 Jan 2021
Expressive Neural Voice Cloning
Paarth Neekhara
Shehzeen Samarah Hussain
Shlomo Dubnov
F. Koushanfar
Julian McAuley
DiffM
35
30
0
30 Jan 2021
High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion
M. S. Al-Radhi
16
1
0
25 Jan 2021
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Eunwoo Song
Ryuichi Yamamoto
Min-Jae Hwang
Jin-Seob Kim
Ohsung Kwon
Jae-Min Kim
19
14
0
19 Jan 2021
Whispered and Lombard Neural Speech Synthesis
Qiong Hu
T. Bleisch
Petko N. Petkov
T. Raitio
Erik Marchi
V. Lakshminarasimhan
12
14
0
13 Jan 2021
Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks
B. Yousaf
Muhammad Usama
Waqas Sultani
Arif Mahmood
Junaid Qadir
25
8
0
03 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
37
66
0
31 Dec 2020
Unified Mandarin TTS Front-end Based on Distilled BERT Model
Yang Zhang
Liqun Deng
Yasheng Wang
21
24
0
31 Dec 2020
Building Multi lingual TTS using Cross Lingual Voice Conversion
Qinghua Sun
Kenji Nagamatsu
6
3
0
28 Dec 2020
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
8
16
0
23 Dec 2020
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal
Changhe Song
Jingbei Li
Yixuan Zhou
Zhiyong Wu
Helen Meng
30
6
0
13 Dec 2020
I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch
Joseph P. Turian
Max Henry
24
29
0
08 Dec 2020
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Chenfeng Miao
Shuang Liang
Zhencheng Liu
Minchuan Chen
Jun Ma
Shaojun Wang
Jing Xiao
22
38
0
07 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
22
8
0
03 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Lingwei Kong
Jing Xiao
16
9
0
03 Dec 2020
FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge
Bichen Wu
Qing He
Peizhao Zhang
T. Koehler
Kurt Keutzer
Peter Vajda
31
6
0
25 Nov 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
28
73
0
17 Nov 2020
Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Zhichao Wang
Wenshuo Ge
Xiong Wang
Shan Yang
Wendong Gan
Haitao Chen
Hai Li
Lei Xie
Xiulin Li
CVBM
41
32
0
17 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
19
5
0
17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
27
55
0
17 Nov 2020
Previous
1
2
3
...
10
11
12
...
15
16
17
Next