Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1703.10135
Cited By
Tacotron: Towards End-to-End Speech Synthesis
29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tacotron: Towards End-to-End Speech Synthesis"
50 / 817 papers shown
Title
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
19
11
0
19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Konstantinos Klapsas
Nikolaos Ellinas
June Sig Sung
Hyoungmin Park
S. Raptis
30
9
0
19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
21
4
0
19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
27
17
0
19 Nov 2021
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
K. Markopoulos
Nikolaos Ellinas
Alexandra Vioni
Myrsini Christidou
Panos Kakoulidis
...
Georgia Maniati
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
Aimilios Chalamandaris
16
2
0
17 Nov 2021
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
Georgia Maniati
Nikolaos Ellinas
K. Markopoulos
G. Vamvoukakis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
8
14
0
17 Nov 2021
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Aimilios Chalamandaris
Georgia Maniati
Panos Kakoulidis
S. Raptis
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
22
36
0
17 Nov 2021
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Songxiang Liu
Dan Su
Dong Yu
25
10
0
14 Nov 2021
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion
Damien Ronssin
Milos Cernak
28
10
0
12 Nov 2021
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
28
29
0
07 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
22
56
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
12
17
0
07 Nov 2021
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
Peter Wu
Jiatong Shi
Yifan Zhong
Shinji Watanabe
A. Black
27
8
0
02 Nov 2021
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units
Anurag Katakkar
A. Black
AuLLM
30
1
0
31 Oct 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
A. P. D. Martos
Albert Sanchis
Alfons Juan-Císcar
19
6
0
29 Oct 2021
Beyond
L
p
L_p
L
p
clipping: Equalization-based Psychoacoustic Attacks against ASRs
H. Abdullah
Muhammad Sajidur Rahman
Christian Peeters
Cassidy Gibson
Washington Garcia
Vincent Bindschaedler
T. Shrimpton
Patrick Traynor
AAML
19
9
0
25 Oct 2021
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
Zhenyu Zhang
Yewei Gu
Xiaowei Yi
Xianfeng Zhao
34
24
0
18 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
40
0
15 Oct 2021
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation
Danni Liu
Changhan Wang
Hongyu Gong
Xutai Ma
Yun Tang
J. Pino
27
4
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
55
60
0
15 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Haitong Zhang
Yue Lin
26
0
0
14 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis
Soonbeom Choi
Juhan Nam
29
14
0
13 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis
Li-Wei Chen
Alexander I. Rudnicky
88
30
0
12 Oct 2021
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations
Wen-Chin Huang
Shu-Wen Yang
Tomoki Hayashi
Hung-yi Lee
Shinji Watanabe
Tomoki Toda
38
40
0
12 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara
Jason Chun Lok Li
Boris Ginsburg
38
15
0
12 Oct 2021
LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example
Hieu-Thi Luong
Junichi Yamagishi
52
9
0
11 Oct 2021
Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding
Chao Wang
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Yibiao Yu
Zejun Ma
29
17
0
10 Oct 2021
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control
Yunchao He
Jian Luan
Yujun Wang
30
1
0
09 Oct 2021
Using multiple reference audios and style embedding constraints for speech synthesis
Cheng Gong
Longbiao Wang
Zhenhua Ling
Ju Zhang
J. Dang
21
5
0
09 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Pengfei Wu
Junjie Pan
Chenchang Xu
Junhui Zhang
Lin Wu
Xiang Yin
Zejun Ma
18
16
0
08 Oct 2021
KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms
Chien-Feng Liao
Jen-Yu Liu
Yi-Hsuan Yang
29
5
0
08 Oct 2021
A study on the efficacy of model pre-training in developing neural text-to-speech system
Guangyan Zhang
Yichong Leng
Daxin Tan
Ying Qin
Kaitao Song
Xu Tan
Sheng Zhao
Tan Lee
27
2
0
08 Oct 2021
Voice Reenactment with F0 and timing constraints and adversarial learning of conversions
F. Bous
L. Benaroya
Nicolas Obin
Axel Roebel
24
2
0
07 Oct 2021
Cloning one's voice using very limited data in the wild
Dongyang Dai
Yuan-Jui Chen
Li Chen
Ming Tu
Lu Liu
Rui Xia
Qiao Tian
Yuping Wang
Yuxuan Wang
SyDa
33
9
0
07 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over
Junchen Lu
Berrak Sisman
Rui Liu
Mingyang Zhang
Haizhou Li
DiffM
41
19
0
07 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
47
78
0
30 Sep 2021
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Shilu Lin
Wenchao Su
Li Meng
Fenglong Xie
Xinhui Li
Li Lu
37
4
0
28 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
36
3
0
22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Emily Wenger
Max Bronckers
Christian Cianfarani
Jenna Cryan
Angela Sha
Haitao Zheng
Ben Y. Zhao
AAML
45
39
0
20 Sep 2021
On-device neural speech synthesis
Sivanand Achanta
Albert Antony
L. Golipour
Jiangchuan Li
T. Raitio
...
Francesco Rossi
Jennifer Shi
Jaimin Upadhyay
David Winarsky
Hepeng Zhang
40
17
0
17 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
34
42
0
14 Sep 2021
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Chuanxin Tang
Chong Luo
Zhiyuan Zhao
Dacheng Yin
Yucheng Zhao
Wenjun Zeng
24
9
0
12 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Songxiang Liu
Shan Yang
Dan Su
Dong Yu
AI4TS
35
10
0
08 Sep 2021
Text-Free Prosody-Aware Generative Spoken Language Modeling
Eugene Kharitonov
Ann Lee
Adam Polyak
Yossi Adi
Jade Copet
...
Tu Nguyen
M. Rivière
Abdel-rahman Mohamed
Emmanuel Dupoux
Wei-Ning Hsu
37
117
0
07 Sep 2021
Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors
Hasam Khalid
Minhan Kim
Shahroz Tariq
Simon S. Woo
36
83
0
07 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
40
18
0
30 Aug 2021
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
37
19
0
25 Aug 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues
Junjie H. Xu
Zhou Fang
Qihang Chen
Satoru Ohno
Pujana Paliyawan
30
4
0
18 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
47
6
0
16 Aug 2021
RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform
Youxuan Ma
Zongze Ren
Shugong Xu
48
39
0
12 Aug 2021
Previous
1
2
3
...
8
9
10
...
15
16
17
Next