ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 817 papers shown
Title
A General Framework for Learning Procedural Audio Models of
  Environmental Sounds
A General Framework for Learning Procedural Audio Models of Environmental Sounds
Danzel Serrano
M. Cartwright
DiffM
DRL
35
1
0
04 Mar 2023
LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme
  conversion
LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion
Chunfeng Wang
Peisong Huang
Yuxiang Zou
Haoyu Zhang
Shichao Liu
Xiang Yin
Zejun Ma
16
2
0
02 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
25
8
0
01 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Ajinkya Kulkarni
Atharva Kulkarni
Sara Shatnawi
Hanan Aldarmaki
19
8
0
28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
38
27
0
27 Feb 2023
A Comparative Analysis Of Latent Regressor Losses For Singing Voice
  Conversion
A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion
Brendan O'Connor
S. Dixon
24
0
0
27 Feb 2023
Jointly Optimizing Translations and Speech Timing to Improve Isochrony
  in Automatic Dubbing
Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing
Alexandra Chronopoulou
Brian Thompson
Prashant Mathur
Yogesh Virkar
Surafel Melaku Lakew
Marcello Federico
26
7
0
25 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
22
3
0
24 Feb 2023
Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants
Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants
Domna Bilika
Nikoletta Michopoulou
E. Alepis
Constantinos Patsakis
33
8
0
20 Feb 2023
MTTM: Metamorphic Testing for Textual Content Moderation Software
MTTM: Metamorphic Testing for Textual Content Moderation Software
Wenxuan Wang
Jen-tse Huang
Weibin Wu
Jianping Zhang
Yizhan Huang
Shuqing Li
Pinjia He
Michael Lyu
58
30
0
11 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
31
85
0
31 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a
  Case Study
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
30
10
0
22 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation
Regeneration Learning: A Learning Paradigm for Data Generation
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Yoshua Bengio
GAN
38
15
0
21 Jan 2023
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Simbarashe Nyatsanga
Taras Kucherenko
Chaitanya Ahuja
G. Henter
Michael Neff
SLR
44
90
0
13 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice
  Conversion
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Hao Liu
Tao Wang
Ruibo Fu
Jiangyan Yi
Zhengqi Wen
J. Tao
23
3
0
10 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for
  Synthetic Emotional Speech Augmentation
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation
Abdullah Shahid
S. Latif
Junaid Qadir
31
23
0
10 Jan 2023
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention
  Mechanism
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
16
2
0
28 Dec 2022
Development and Evaluation of a Learning-based Model for Real-time
  Haptic Texture Rendering
Development and Evaluation of a Learning-based Model for Real-time Haptic Texture Rendering
Negin Heravi
Heather Culbertson
Allison M. Okamura
Jeannette Bohg
DiffM
9
8
0
27 Dec 2022
Source Tracing: Detecting Voice Spoofing
Source Tracing: Detecting Voice Spoofing
Tinglong Zhu
Xingming Wang
Xiaoyi Qin
Ming Li
29
12
0
16 Dec 2022
Investigation of Japanese PnG BERT language model in text-to-speech
  synthesis for pitch accent language
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Yusuke Yasuda
T. Toda
33
8
0
16 Dec 2022
Speech and Natural Language Processing Technologies for Pseudo-Pilot
  Simulator
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator
Amrutha Prasad
Juan Pablo Zuluaga
P. Motlícek
Seyyed Saeed Sarfjoo
Iuliia Nigmatulina
Karel Veselý
36
3
0
14 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and
  Speaker-wise Normalization in Speech Synthesis
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Xiaorui Wang
Zhongyuan Wang
BDL
34
6
0
13 Dec 2022
Direct Speech-to-speech Translation without Textual Annotation using
  Bottleneck Features
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Junhui Zhang
Junjie Pan
Xiang Yin
Zejun Ma
27
0
0
12 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
  Dataset
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
28
1
0
11 Dec 2022
GreenEyes: An Air Quality Evaluating Model based on WaveNet
GreenEyes: An Air Quality Evaluating Model based on WaveNet
Kan Huang
Kai Zhang
Ming-de Liu
17
2
0
08 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
75
25
0
08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and
  Transfer Learning
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Ankur Debnath
Shridevi S Patil
Gangotri Nadiger
R. Ganesan
29
20
0
07 Dec 2022
Learning the joint distribution of two sequences using little or no
  paired data
Learning the joint distribution of two sequences using little or no paired data
Soroosh Mariooryad
Matt Shannon
Siyuan Ma
Tom Bagby
David Kao
Daisy Stanton
Eric Battenberg
RJ Skerry-Ryan
30
2
0
06 Dec 2022
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice
  Source Features
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features
T. U. K. Reddy
Sahukari Chaitanya Varun
Kota Pranav Kumar Sankala Sreekanth
K. Murty
23
0
0
05 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice
  Synthesis
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Qicong Xie
Jixun Yao
Linfu Xie
Dan Su
DiffM
21
8
0
03 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based
  architecture for zero-shot multi-speaker text-to-speech
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Byoung Jin Choi
Myeonghun Jeong
Joun Yeop Lee
N. Kim
23
13
0
30 Nov 2022
Neural Speech Phase Prediction based on Parallel Estimation Architecture
  and Anti-Wrapping Losses
Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses
Yang Ai
Zhenhua Ling
21
24
0
29 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and
  Opportunities
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
45
2
0
26 Nov 2022
Efficient Incremental Text-to-Speech on GPUs
Efficient Incremental Text-to-Speech on GPUs
Muyang Du
Chuan Liu
Jiaxing Qi
Junjie Lai
24
1
0
25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural
  MIDI-to-Audio Synthesis Systems?
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Xuan Shi
Erica Cooper
Xin Wang
Junichi Yamagishi
Shrikanth Narayanan
27
1
0
25 Nov 2022
3d human motion generation from the text via gesture action
  classification and the autoregressive model
3d human motion generation from the text via gesture action classification and the autoregressive model
Gwantae Kim
Youngsuk Ryu
Junyeop Lee
D. Han
Jeongmin Bae
Hanseok Ko
17
2
0
18 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
43
18
0
17 Nov 2022
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone
  Disambiguation
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation
Chunyu Qiang
Peng Yang
Hao Che
Jinba Xiao
Xiaorui Wang
Zhongyuan Wang
20
3
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
20
46
0
17 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with
  Multi-factor Constraints
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang
Xinsheng Wang
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
30
5
0
16 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
39
12
0
13 Nov 2022
Online Phase Reconstruction via DNN-based Phase Differences Estimation
Online Phase Reconstruction via DNN-based Phase Differences Estimation
Yoshiki Masuyama
Kohei Yatabe
Kento Nagatomo
Yasuhiro Oikawa
3DV
16
7
0
12 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
  Multi-Speaker Text-to-Speech
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Xiaoran Fan
Chao Pang
Tian Yuan
Richard He Bai
Renjie Zheng
...
Junkun Chen
Zeyu Chen
Liang Huang
Yu Sun
Hua Wu
40
0
0
07 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational
  Autoencoder
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
28
6
0
07 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems
  via Vowel Space
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee
Jaesung Bae
Seongkyu Mun
Heejin Choi
Joun Yeop Lee
Hoon-Young Cho
Chanwoo Kim
32
2
0
06 Nov 2022
Distinguishable Speaker Anonymization based on Formant and Fundamental
  Frequency Scaling
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling
Jixun Yao
Qing Wang
Yi Lei
Pengcheng Guo
Linfu Xie
Namin Wang
Jie Liu
38
14
0
06 Nov 2022
Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered
  Speech
Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech
Xin Zhang
Iván Vallés-Pérez
A. Stolcke
Chengzhu Yu
J. Droppo
Olabanji Shonibare
Roberto Barra-Chicote
Venkatesh Ravichandran
33
6
0
04 Nov 2022
CCATMos: Convolutional Context-aware Transformer Network for
  Non-intrusive Speech Quality Assessment
CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment
Yuchen Liu
Li-Chia Yang
Alex Pawlicki
Marko Stamenovic
27
6
0
04 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior
  Networks for expressive speech synthesis
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Konstantinos Klapsas
Karolos Nikitaras
Nikolaos Ellinas
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
26
0
0
02 Nov 2022
Singing Voice Synthesis with Vibrato Modeling and Latent Energy
  Representation
Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation
Yingjie Song
Wei Song
Wei Zhang
Zhengchen Zhang
Dan Zeng
Zhi Liu
Yang Yu
DiffM
19
5
0
02 Nov 2022
Previous
123456...151617
Next