Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1703.10135
Cited By
Tacotron: Towards End-to-End Speech Synthesis
29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tacotron: Towards End-to-End Speech Synthesis"
50 / 817 papers shown
Title
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement
Wei Song
Ya Yue
Ya-Jie Zhang
Zhengchen Zhang
Youzheng Wu
Xiaodong He
32
4
0
02 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
43
18
0
01 Nov 2022
Generating Multilingual Gender-Ambiguous Text-to-Speech Voices
K. Markopoulos
Georgia Maniati
G. Vamvoukakis
Nikolaos Ellinas
Georgios Vardaxoglou
...
Gunu Jho
Inchul Hwang
Aimilios Chalamandaris
Pirros Tsiakoulis
S. Raptis
44
1
0
01 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
29
2
0
31 Oct 2022
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection
L. Attorresi
Davide Salvi
Clara Borrelli
Paolo Bestagini
Stefano Tubaro
21
22
0
31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis
Anusha Prakash
H. Murthy
34
0
0
31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
23
0
0
28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
31
10
0
28 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Yifan Hu
Rui Liu
Guanglai Gao
Haizhou Li
152
7
0
27 Oct 2022
RedPen: Region- and Reason-Annotated Dataset of Unnatural Speech
Kyumin Park
Keon Lee
Daeyoung Kim
Dongyeop Kang
26
0
0
26 Oct 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
AI4TS
28
5
0
25 Oct 2022
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion
Kun Zhou
Berrak Sisman
Carlos Busso
Bin Ma
Haizhou Li
37
3
0
25 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS
Ziqi Liang
36
0
0
24 Oct 2022
Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS
Chunyu Qiang
J. Tao
Ruibo Fu
Zhengqi Wen
Jiangyan Yi
Tao Wang
Shiming Wang
11
0
0
20 Oct 2022
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Detai Xin
Hiroshi Saruwatari
45
3
0
18 Oct 2022
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Hien Ohnaka
Shinnosuke Takamichi
Keisuke Imoto
Yuki Okamoto
Kazuki Fujii
Hiroshi Saruwatari
DiffM
24
8
0
17 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Emily R. Bartusiak
Edward J. Delp
27
12
0
14 Oct 2022
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion
Yuxiang Zhang
Jingze Lu
Xingming Wang
Zhuo Li
Runqiu Xiao
Wenchao Wang
Ming Li
Pengyuan Zhang
46
5
0
13 Oct 2022
SQuId: Measuring Speech Naturalness in Many Languages
Thibault Sellam
Ankur Bapna
Joshua Camp
Diana Mackinnon
Ankur P. Parikh
Jason Riesa
35
17
0
12 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
37
14
0
12 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
27
5
0
12 Oct 2022
Style-Guided Inference of Transformer for High-resolution Image Synthesis
Jonghwa Yim
Minjae Kim
ViT
37
0
0
11 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
20
53
0
06 Oct 2022
The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
Daniele Mari
Federica Latora
Simone Milani
15
11
0
06 Oct 2022
A Deep Investigation of RNN and Self-attention for the Cyrillic-Traditional Mongolian Bidirectional Conversion
Muhan Na
Rui Liu
Feilong
Guanglai Gao
35
0
0
24 Sep 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Yifan Hu
Pengkai Yin
Rui Liu
F. Bao
Guanglai Gao
18
5
0
22 Sep 2022
AutoLV: Automatic Lecture Video Generator
Wen Wang
Yang Song
Sanjay Jha
VGen
29
3
0
19 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS
Liumeng Xue
Frank Soong
Shaofei Zhang
Linfu Xie
27
23
0
14 Sep 2022
Deep Speech Synthesis from Articulatory Representations
Peter Wu
Shinji Watanabe
Louis Goldstein
A. Black
Gopala K. Anumanchipalli
39
25
0
13 Sep 2022
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation
Peining Zhang
Junliang Guo
Linli Xu
Mu You
Junming Yin
24
0
0
05 Sep 2022
Evaluating generative audio systems and their metrics
Ashvala Vinay
Alexander Lerch
35
19
0
31 Aug 2022
Visualising Model Training via Vowel Space for Text-To-Speech Systems
Binu Abeysinghe
Jesin James
C. Watson
Felix Marattukalam
32
2
0
21 Aug 2022
Fully Automated End-to-End Fake Audio Detection
Chenglong Wang
Jiangyan Yi
J. Tao
Haiyang Sun
Xun Chen
Zhengkun Tian
Haoxin Ma
Cunhang Fan
Ruibo Fu
26
28
0
20 Aug 2022
Pathway to Future Symbiotic Creativity
Yi-Ting Guo
Qi-fei Liu
Jie Chen
Wei Xue
Jie Fu
...
Fernando Rosas
Jeffrey Shaw
Xing Wu
Jiji Zhang
Jianliang Xu
34
0
0
18 Aug 2022
Enhancing Audio Perception of Music By AI Picked Room Acoustics
Prateek Verma
J. Berger
21
0
0
16 Aug 2022
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
27
44
0
11 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Xiang Li
Changhe Song
X. Wei
Zhiyong Wu
Jia Jia
Helen Meng
29
4
0
10 Aug 2022
AdaCat: Adaptive Categorical Discretization for Autoregressive Models
Qiyang Li
Ajay Jain
Pieter Abbeel
OffRL
45
4
0
03 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
27
4
0
03 Aug 2022
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network
Da-Rong Liu
Po-Chun Hsu
Yi-Chen Chen
Sung-Feng Huang
Shun-Po Chuang
Da-Yi Wu
Hung-yi Lee
GAN
31
7
0
29 Jul 2022
SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation
Artem Ploujnikov
Mirco Ravanelli
9
18
0
27 Jul 2022
A Proposal for Foley Sound Synthesis Challenge
Keunwoo Choi
Sangshin Oh
Minsung Kang
Brian McFee
26
11
0
21 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
36
297
0
20 Jul 2022
End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Thierry Desot
François Portet
Michel Vacher
27
12
0
17 Jul 2022
Data Augmentation for Low-Resource Quechua ASR Improvement
Rodolfo Zevallos
Núria Bel
Guillermo Cámbara
Mireia Farrús
Jordi Luque
VLM
SyDa
19
6
0
14 Jul 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
44
195
0
13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
36
10
0
13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate
Nabarun Goswami
Tatsuya Harada
26
5
0
13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection
Haoxin Ma
Jiangyan Yi
Chenglong Wang
Xin Yan
J. Tao
Tao Wang
Shiming Wang
Ruibo Fu
24
26
0
12 Jul 2022
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion
Wen-Chin Huang
Shu-Wen Yang
Tomoki Hayashi
T. Toda
21
15
0
10 Jul 2022
Previous
1
2
3
...
5
6
7
...
15
16
17
Next