ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 817 papers shown
Title
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End
  Speech Synthesis
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
19
11
0
19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Konstantinos Klapsas
Nikolaos Ellinas
June Sig Sung
Hyoungmin Park
S. Raptis
30
9
0
19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent
  Phoneme-level Prosody Control
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
21
4
0
19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
27
17
0
19 Nov 2021
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
K. Markopoulos
Nikolaos Ellinas
Alexandra Vioni
Myrsini Christidou
Panos Kakoulidis
...
Georgia Maniati
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
Aimilios Chalamandaris
16
2
0
17 Nov 2021
Cross-lingual Low Resource Speaker Adaptation Using Phonological
  Features
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
Georgia Maniati
Nikolaos Ellinas
K. Markopoulos
G. Vamvoukakis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
8
14
0
17 Nov 2021
High Quality Streaming Speech Synthesis with Low,
  Sentence-Length-Independent Latency
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Aimilios Chalamandaris
Georgia Maniati
Panos Kakoulidis
S. Raptis
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
22
36
0
17 Nov 2021
Meta-Voice: Fast few-shot style transfer for expressive voice cloning
  using meta learning
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Songxiang Liu
Dan Su
Dong Yu
25
10
0
14 Nov 2021
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice
  Conversion
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion
Damien Ronssin
Milos Cernak
28
10
0
12 Nov 2021
Speaker Generation
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
28
29
0
07 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
22
56
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
12
17
0
07 Nov 2021
Cross-lingual Transfer for Speech Processing using Acoustic Language
  Similarity
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
Peter Wu
Jiatong Shi
Yifan Zhong
Shinji Watanabe
A. Black
27
8
0
02 Nov 2021
Towards Language Modelling in the Speech Domain Using Sub-word
  Linguistic Units
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units
Anurag Katakkar
A. Black
AuLLM
30
1
0
31 Oct 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
A. P. D. Martos
Albert Sanchis
Alfons Juan-Císcar
19
6
0
29 Oct 2021
Beyond $L_p$ clipping: Equalization-based Psychoacoustic Attacks against
  ASRs
Beyond LpL_pLp​ clipping: Equalization-based Psychoacoustic Attacks against ASRs
H. Abdullah
Muhammad Sajidur Rahman
Christian Peeters
Cassidy Gibson
Washington Garcia
Vincent Bindschaedler
T. Shrimpton
Patrick Traynor
AAML
19
9
0
25 Oct 2021
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
Zhenyu Zhang
Yewei Gu
Xiaowei Yi
Xianfeng Zhao
34
24
0
18 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
40
0
15 Oct 2021
From Start to Finish: Latency Reduction Strategies for Incremental
  Speech Synthesis in Simultaneous Speech-to-Speech Translation
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation
Danni Liu
Changhan Wang
Hongyu Gong
Xutai Ma
Yun Tang
J. Pino
27
4
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
55
60
0
15 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Haitong Zhang
Yue Lin
26
0
0
14 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis
A Melody-Unsupervision Model for Singing Voice Synthesis
Soonbeom Choi
Juhan Nam
29
14
0
13 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis
Fine-grained style control in Transformer-based Text-to-speech Synthesis
Li-Wei Chen
Alexander I. Rudnicky
88
30
0
12 Oct 2021
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised
  Speech Representations
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations
Wen-Chin Huang
Shu-Wen Yang
Tomoki Hayashi
Hung-yi Lee
Shinji Watanabe
Tomoki Toda
38
40
0
12 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning
Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara
Jason Chun Lok Li
Boris Ginsburg
38
15
0
12 Oct 2021
LaughNet: synthesizing laughter utterances from waveform silhouettes and
  a single laughter example
LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example
Hieu-Thi Luong
Junichi Yamagishi
52
9
0
11 Oct 2021
Towards High-fidelity Singing Voice Conversion with Acoustic Reference
  and Contrastive Predictive Coding
Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding
Chao Wang
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Yibiao Yu
Zejun Ma
29
17
0
10 Oct 2021
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS
  With Accurate Phoneme Duration Control
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control
Yunchao He
Jian Luan
Yujun Wang
30
1
0
09 Oct 2021
Using multiple reference audios and style embedding constraints for
  speech synthesis
Using multiple reference audios and style embedding constraints for speech synthesis
Cheng Gong
Longbiao Wang
Zhenhua Ling
Ju Zhang
J. Dang
21
5
0
09 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer
  Normalization and Semi-Supervised Training in Text-To-Speech
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Pengfei Wu
Junjie Pan
Chenchang Xu
Junhui Zhang
Lin Wu
Xiang Yin
Zejun Ma
18
16
0
08 Oct 2021
KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using
  Mel-spectrograms
KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms
Chien-Feng Liao
Jen-Yu Liu
Yi-Hsuan Yang
29
5
0
08 Oct 2021
A study on the efficacy of model pre-training in developing neural
  text-to-speech system
A study on the efficacy of model pre-training in developing neural text-to-speech system
Guangyan Zhang
Yichong Leng
Daxin Tan
Ying Qin
Kaitao Song
Xu Tan
Sheng Zhao
Tan Lee
27
2
0
08 Oct 2021
Voice Reenactment with F0 and timing constraints and adversarial
  learning of conversions
Voice Reenactment with F0 and timing constraints and adversarial learning of conversions
F. Bous
L. Benaroya
Nicolas Obin
Axel Roebel
24
2
0
07 Oct 2021
Cloning one's voice using very limited data in the wild
Cloning one's voice using very limited data in the wild
Dongyang Dai
Yuan-Jui Chen
Li Chen
Ming Tu
Lu Liu
Rui Xia
Qiao Tian
Yuping Wang
Yuxuan Wang
SyDa
33
9
0
07 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic
  Voice Over
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over
Junchen Lu
Berrak Sisman
Rui Liu
Mingyang Zhang
Haizhou Li
DiffM
41
19
0
07 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
47
78
0
30 Sep 2021
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Shilu Lin
Wenchao Su
Li Meng
Fenglong Xie
Xinhui Li
Li Lu
37
4
0
28 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context
  Prediction Network
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
36
3
0
22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the
  Real World
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Emily Wenger
Max Bronckers
Christian Cianfarani
Jenna Cryan
Angela Sha
Haitao Zheng
Ben Y. Zhao
AAML
45
39
0
20 Sep 2021
On-device neural speech synthesis
On-device neural speech synthesis
Sivanand Achanta
Albert Antony
L. Golipour
Jiangchuan Li
T. Raitio
...
Francesco Rossi
Jennifer Shi
Jaimin Upadhyay
David Winarsky
Hepeng Zhang
40
17
0
17 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech
  synthesis
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
34
42
0
14 Sep 2021
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Chuanxin Tang
Chong Luo
Zhiyuan Zhao
Dacheng Yin
Yucheng Zhao
Wenjun Zeng
24
9
0
12 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with
  low-quality data for expressive speech synthesis
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Songxiang Liu
Shan Yang
Dan Su
Dong Yu
AI4TS
35
10
0
08 Sep 2021
Text-Free Prosody-Aware Generative Spoken Language Modeling
Text-Free Prosody-Aware Generative Spoken Language Modeling
Eugene Kharitonov
Ann Lee
Adam Polyak
Yossi Adi
Jade Copet
...
Tu Nguyen
M. Rivière
Abdel-rahman Mohamed
Emmanuel Dupoux
Wei-Ning Hsu
37
117
0
07 Sep 2021
Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal
  and Multimodal Detectors
Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors
Hasam Khalid
Minhan Kim
Shahroz Tariq
Simon S. Woo
36
83
0
07 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
40
18
0
30 Aug 2021
Integrated Speech and Gesture Synthesis
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
37
19
0
25 Aug 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing
  Highlight Cues
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues
Junjie H. Xu
Zhou Fang
Qihang Chen
Satoru Ohno
Pujana Paliyawan
30
4
0
18 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
47
6
0
16 Aug 2021
RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform
RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform
Youxuan Ma
Zongze Ren
Shugong Xu
48
39
0
12 Aug 2021
Previous
123...8910...151617
Next