ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 817 papers shown
Title
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling
  on Time Variability
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Hyun Joon Park
Jin Sob Kim
Wooseok Shin
Sung Won Han
DiffM
41
2
0
27 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
58
11
0
25 Jun 2024
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS
  Prediction
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
Yuxun Tang
Jiatong Shi
Yuning Wu
Qin Jin
37
9
0
16 Jun 2024
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical
  Emotion Vector for Controllable Emotional Text-to-Speech
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
45
7
0
12 Jun 2024
Towards Signal Processing In Large Language Models
Towards Signal Processing In Large Language Models
Prateek Verma
Mert Pilanci
47
3
0
10 Jun 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody
  Modeling
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
46
2
0
09 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
50
15
0
08 Jun 2024
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis
  with Context-Aware Contrastive Language-Audio Pretraining
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining
Jinlong Xue
Yayue Deng
Yingming Gao
Ya Li
RALM
VLM
42
4
0
06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with
  Multi-Modal Context and Large Language Model
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
40
4
0
06 Jun 2024
Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Ahad Jawaid
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
MoE
45
0
0
05 Jun 2024
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Kun Zhou
Shengkui Zhao
Yukun Ma
Chong Zhang
Hao Wang
Dianwen Ng
Chongjia Ni
Nguyen Trung Hieu
J. Yip
Bin Ma
38
5
0
04 Jun 2024
Sok: Comprehensive Security Overview, Challenges, and Future Directions
  of Voice-Controlled Systems
Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems
Haozhe Xu
Cong Wu
Yangyang Gu
Xingcan Shang
Jing Chen
Kun He
Ruiying Du
60
3
0
27 May 2024
RSET: Remapping-based Sorting Method for Emotion Transfer Speech
  Synthesis
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Haoxiang Shi
Jianzong Wang
Xulong Zhang
Ning Cheng
Jun Yu
Jing Xiao
41
2
0
27 May 2024
DiffNorm: Self-Supervised Normalization for Non-autoregressive
  Speech-to-speech Translation
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation
Weiting Tan
Jingyu Zhang
Lingfeng Shen
Daniel Khashabi
Philipp Koehn
32
0
0
22 May 2024
Exploring speech style spaces with language models: Emotional TTS
  without emotion labels
Exploring speech style spaces with language models: Emotional TTS without emotion labels
Shreeram Suresh Chandra
Zongyang Du
Berrak Sisman
46
2
0
18 May 2024
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset
Yang Hou
Haitao Fu
Chuankai Chen
Zida Li
Haoyu Zhang
Jianjun Zhao
32
3
0
14 May 2024
Architecture of a Cortex Inspired Hierarchical Event Recaller
Architecture of a Cortex Inspired Hierarchical Event Recaller
Valentín Puente Varona
16
1
0
03 May 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
40
1
0
30 Apr 2024
TI-ASU: Toward Robust Automatic Speech Understanding through
  Text-to-speech Imputation Against Missing Speech Modality
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
Tiantian Feng
Xuan Shi
Rahul Gupta
Shrikanth S. Narayanan
49
0
0
27 Apr 2024
CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with
  Contrastive Learning
CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Hao Wu
Jing Chen
Ruiying Du
Cong Wu
Kun He
Xingcan Shang
Hao Ren
Guowen Xu
AAML
52
8
0
24 Apr 2024
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual
  Expressiveness Annotations
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Sen Liu
Yiwei Guo
Xie Chen
Kai Yu
29
1
0
23 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
38
16
0
23 Apr 2024
U Can't Gen This? A Survey of Intellectual Property Protection Methods
  for Data in Generative AI
U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI
Tanja Sarcevic
Alicja Karlowicz
Rudolf Mayer
Ricardo A. Baeza-Yates
Andreas Rauber
49
6
0
22 Apr 2024
Leveraging the Interplay Between Syntactic and Acoustic Cues for
  Optimizing Korean TTS Pause Formation
Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation
Yejin Jeon
Yunsu Kim
Gary Geunbae Lee
33
0
0
03 Apr 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through
  Weighted Samplers and Consistency Models
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
40
6
0
31 Mar 2024
Low-Latency Neural Speech Phase Prediction based on Parallel Estimation
  Architecture and Anti-Wrapping Losses for Speech Generation Tasks
Low-Latency Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks
Yang Ai
Zhenhua Ling
26
3
0
26 Mar 2024
Building speech corpus with diverse voice characteristics for its
  prompt-based representation
Building speech corpus with diverse voice characteristics for its prompt-based representation
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
37
0
0
20 Mar 2024
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight
  Text-to-Speech
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Ziqi Liang
Haoxiang Shi
Jiawei Wang
Keda Lu
43
0
0
13 Mar 2024
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot
  Text-to-Speech with Model and Data Scaling
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Chunhui Wang
Chang Zeng
Bowen Zhang
Ziyang Ma
Yefan Zhu
Zifeng Cai
Jian Zhao
Zhonglin Jiang
Yong Chen
SyDa
44
5
0
09 Mar 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker
  Replication
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
31
2
0
06 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
49
145
0
05 Mar 2024
AIx Speed: Playback Speed Optimization Using Listening Comprehension of
  Speech Recognition Models
AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models
Kazuki Kawamura
Jun Rekimoto
20
0
0
05 Mar 2024
An Automated End-to-End Open-Source Software for High-Quality
  Text-to-Speech Dataset Generation
An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation
Ahmet Gunduz
K. Yuksel
Kareem Darwish
Golara Javadi
Fabio Minazzi
Nicola Sobieski
Sebastien Bratieres
25
0
0
26 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing
  Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
28
7
0
25 Feb 2024
An Inpainting-Infused Pipeline for Attire and Background Replacement
An Inpainting-Infused Pipeline for Attire and Background Replacement
F. Mahlow
A. F. Zanella
William Alberto Cruz-Castaneda
Marcellus Amadeus
41
0
0
05 Feb 2024
SymbolicAI: A framework for logic-based approaches combining generative
  models and solvers
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Marius-Constantin Dinu
Claudiu Leoveanu-Condrei
Markus Holzleitner
Werner Zellinger
Sepp Hochreiter
45
10
0
01 Feb 2024
MunTTS: A Text-to-Speech System for Mundari
MunTTS: A Text-to-Speech System for Mundari
Varun Gumma
Rishav Hada
Aditya Yadavalli
Pamir Gogoi
Ishani Mondal
Vivek Seshadri
Kalika Bali
34
1
0
28 Jan 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
  Self-Supervised Representation Mixing and Embedding Initialization
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
31
0
0
23 Jan 2024
Data-driven grapheme-to-phoneme representations for a lexicon-free
  text-to-speech
Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
Abhinav Garg
Jiyeon Kim
Sushil Khyalia
Chanwoo Kim
Dhananjaya N. Gowda
28
2
0
19 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
21
36
0
14 Jan 2024
End to end Hindi to English speech conversion using Bark, mBART and a
  finetuned XLSR Wav2Vec2
End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2
Aniket Tathe
Anand Kamble
Suyash Kumbharkar
Atharva Bhandare
Anirban C. Mitra
35
1
0
11 Jan 2024
Denoising Vision Transformers
Denoising Vision Transformers
Jiawei Yang
Katie Z Luo
Jie Li
Kilian Q. Weinberger
Yonglong Tian
Yue Wang
DiffM
27
13
0
05 Jan 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
31
21
0
22 Dec 2023
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
  Graph-Based Context Modeling
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
42
17
0
19 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
31
28
0
15 Dec 2023
Detecting Voice Cloning Attacks via Timbre Watermarking
Detecting Voice Cloning Attacks via Timbre Watermarking
Chang-rui Liu
Jie Zhang
Tianwei Zhang
Xi Yang
Weiming Zhang
Neng H. Yu
33
29
0
06 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
37
12
0
05 Dec 2023
Custom Data Augmentation for low resource ASR using Bark and
  Retrieval-Based Voice Conversion
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
Anand Kamble
Aniket Tathe
Suyash Kumbharkar
Atharva Bhandare
Anirban C. Mitra
30
1
0
24 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
35
31
0
21 Nov 2023
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized
  Representation
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Jiangzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
32
0
0
14 Nov 2023
Previous
12345...151617
Next