ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 817 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
20
0
0
14 May 2025
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
Hyouin Liu
Zhikuan Zhang
34
0
0
12 May 2025
Beyond Identity: A Generalizable Approach for Deepfake Audio Detection
Beyond Identity: A Generalizable Approach for Deepfake Audio Detection
Yasaman Ahmadiadli
Xiao-Ping Zhang
Naimul Khan
31
0
0
10 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
44
0
0
01 May 2025
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
Yubing Cao
Yinfeng Yu
Yongming Li
Liejun Wang
29
0
0
12 Apr 2025
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Haowei Lou
Hye-Young Paik
Sheng Li
Wen Hu
Lina Yao
48
0
0
11 Apr 2025
SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation
SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation
Stephen Brade
Sam Anderson
Rithesh Kumar
Zeyu Jin
Anh Truong
41
0
0
07 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
62
0
0
07 Apr 2025
RWKVTTS: Yet another TTS based on RWKV-7
RWKVTTS: Yet another TTS based on RWKV-7
Lin Yueyu
Liu Xiao
49
0
0
04 Apr 2025
Text-Driven Voice Conversion via Latent State-Space Modeling
Text-Driven Voice Conversion via Latent State-Space Modeling
Wen Li
Sofia Martinez
Priyanka Shah
53
0
0
26 Mar 2025
SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling
SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling
Yaofei Wang
Gang Pei
Kejiang Chen
Jinyang Ding
Chao Pan
Weilong Pang
Donghui Hu
Wenbo Zhang
51
1
0
25 Mar 2025
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Weihao Wu
Zhiwei Lin
Yixuan Zhou
Jingbei Li
Rui Niu
Qinghua Wu
Songjun Cao
Long Ma
Zhiyong Wu
DiffM
44
0
0
27 Feb 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
68
0
0
26 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yansen Wang
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
66
4
0
28 Jan 2025
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Qianniu Chen
Xiaoyang Hao
Yangqiu Song
Yunxing Liu
Li Lu
41
0
0
15 Jan 2025
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Dongyang Dai
Zhiyong Wu
Shiyin Kang
Xixin Wu
Jia Jia
Dan Su
Dong Yu
Helen Meng
42
26
0
03 Jan 2025
Autoregressive Speech Synthesis with Next-Distribution Prediction
Autoregressive Speech Synthesis with Next-Distribution Prediction
Xinfa Zhu
WenJie Tian
Lei Xie
VLM
168
4
0
22 Dec 2024
A Review of Human Emotion Synthesis Based on Generative Technology
A Review of Human Emotion Synthesis Based on Generative Technology
Fei Ma
Yong Li
Yifan Xie
Y. He
Yujie Zhang
...
Z. Liu
Wei Yao
Fuji Ren
Fei Richard Yu
Shiguang Ni
78
1
0
10 Dec 2024
Methodology for Online Estimation of Rheological Parameters in Polymer
  Melts Using Deep Learning and Microfluidics
Methodology for Online Estimation of Rheological Parameters in Polymer Melts Using Deep Learning and Microfluidics
Juan Sandubete-López
José L. Risco-Martín
Alexander H. McMillan
Eva Besada-Portas
59
0
0
05 Dec 2024
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for
  Text-to-Speech with Diverse and Controllable Styles
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Jiaxuan Liu
Zhaoci Liu
Yihan Hu
Yingying Gao
Shilei Zhang
Zhenhua Ling
DiffM
88
2
0
04 Dec 2024
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
Xiao-Hang Jiang
Hui-Peng Du
Yang Ai
Ye-Xin Lu
Zhen-Hua Ling
30
0
0
18 Nov 2024
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge
  Retrieval with Large Language Models
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models
Dongrui Han
Mingyu Cui
Jiawen Kang
Xixin Wu
Xunying Liu
Helen Meng
32
1
0
12 Nov 2024
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing
  Audio Generation Challenge
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge
Dake Guo
J.-H. Yao
Xinfa Zhu
Kangxiang Xia
Zhao Guo
Ziyu Zhang
Yishuo Wang
Jie Liu
Lei Xie
39
1
0
31 Oct 2024
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks,
  Results and Findings
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings
Kangxiang Xia
Dake Guo
J.-H. Yao
Liumeng Xue
Hanzhao Li
...
Lei Xie
Qingqing Zhang
L. Luo
M. Dong
Peng Sun
57
1
0
31 Oct 2024
Making Social Platforms Accessible: Emotion-Aware Speech Generation with
  Integrated Text Analysis
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Suparna De
Ionut Bostan
Nishanth Sastry
34
0
0
24 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Guanrou Yang
Fan Yu
Z. Ma
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
32
1
0
22 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in
  Multispeaker Text-to-Speech
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
18
2
0
17 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech
  Synthesis with Discrete Codec Modeling of EnGen-TTS
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
23
0
0
09 Oct 2024
Can DeepFake Speech be Reliably Detected?
Can DeepFake Speech be Reliably Detected?
Hongbin Liu
Youzheng Chen
Arun Narayanan
Athula Balachandran
Pedro J. Moreno
Lun Wang
AAML
35
1
0
09 Oct 2024
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
Taejun Bak
Youngsik Eom
SeungJae Choi
Young-Sun Joo
38
0
0
04 Oct 2024
Speechworthy Instruction-tuned Language Models
Speechworthy Instruction-tuned Language Models
Hyundong Justin Cho
Nicolaas Jedema
Leonardo F. R. Ribeiro
Karishma Sharma
Pedro Szekely
Alessandro Moschitti
Ruben Janssen
Jonathan May
ALM
44
1
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
56
3
0
23 Sep 2024
Preference Alignment Improves Language Model-Based TTS
Preference Alignment Improves Language Model-Based TTS
Jinchuan Tian
Chunlei Zhang
Jiatong Shi
Hao Zhang
Jianwei Yu
Shinji Watanabe
Dong Yu
32
7
0
19 Sep 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Yuan Feng
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
37
4
0
18 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
40
2
0
13 Sep 2024
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic
  Framework and its Applicability in Automatic Pronunciation Assessment
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
Tien-Hong Lo
Meng-Ting Tsai
Berlin Chen
32
0
0
11 Sep 2024
A Framework for Synthetic Audio Conversations Generation using Large
  Language Models
A Framework for Synthetic Audio Conversations Generation using Large Language Models
Kaung Myat Kyaw
Jonathan Hoyin Chan
SyDa
31
2
0
02 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
36
42
0
01 Sep 2024
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained
  Controllable Text-to-Speech
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech
Haowei Lou
Helen Paik
Wen Hu
Lina Yao
VLM
41
0
0
27 Aug 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
45
1
0
20 Aug 2024
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a
  Low-Resource Language
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language
Manjil Karki
Pratik Shakya
Sandesh Acharya
Ravi Pandit
Dinesh Gothe
31
0
0
19 Aug 2024
Supervised and Unsupervised Alignments for Spoofing Behavioral
  Biometrics
Supervised and Unsupervised Alignments for Spoofing Behavioral Biometrics
Thomas Thebaud
Gaël Le Lan
Anthony Larcher
AAML
37
0
0
14 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for
  Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
41
0
0
11 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
40
0
0
08 Aug 2024
Illustrating Classic Brazilian Books using a Text-To-Image Diffusion
  Model
Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model
F. Mahlow
André Felipe Zanella
Stefano Recanatesi
Regilene Aparecida Sarzi Ribeiro
40
1
0
01 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
35
1
0
01 Aug 2024
TTSDS -- Text-to-Speech Distribution Score
TTSDS -- Text-to-Speech Distribution Score
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
45
0
0
17 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
54
33
0
11 Jul 2024
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for
  Text-to-Speech Speaker Adaptation
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Ruibo Fu
Xin Qi
Zhengqi Wen
Jianhua Tao
Tao Wang
...
Xiaopeng Wang
Shuchen Shi
Yukun Liu
Xuefei Liu
Shuai Zhang
54
0
0
07 Jul 2024
CATT: Character-based Arabic Tashkeel Transformer
CATT: Character-based Arabic Tashkeel Transformer
Faris Alasmary
Orjuwan Zaafarani
Ahmad Ghannam
38
0
0
03 Jul 2024
1234...151617
Next