Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1703.10135
Cited By
Tacotron: Towards End-to-End Speech Synthesis
29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tacotron: Towards End-to-End Speech Synthesis"
50 / 817 papers shown
Title
tPLCnet: Real-time Deep Packet Loss Concealment in the Time Domain Using a Short Temporal Context
Nils L. Westhausen
B. Meyer
23
7
0
04 Apr 2022
Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim
Joanna Hong
Y. Ro
33
51
0
04 Apr 2022
Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee
Joun Yeop Lee
Heejin Choi
Seongkyu Mun
Sangjun Park
Jae-Sung Bae
Chanwoo Kim
27
4
0
04 Apr 2022
On incorporating social speaker characteristics in synthetic speech
S. Rallabandi
Sebastian Möller
21
0
0
03 Apr 2022
Residual-guided Personalized Speech Synthesis based on Face Image
Jianrong Wang
Zixuan Wang
Xiaosheng Hu
Xuewei Li
Qiang Fang
Li Liu
CVBM
27
16
0
01 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Fan Wang
Po-Chun Hsu
Da-Rong Liu
Hung-yi Lee
18
0
0
01 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
22
32
0
31 Mar 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang
Kaitao Song
Xu Tan
Daxin Tan
Yuzi Yan
...
G. Wang
Wei Zhou
Tao Qin
Tan Lee
Sheng Zhao
SSL
30
21
0
31 Mar 2022
NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism
Jingbei Li
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
25
21
0
31 Mar 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion
Zijiang Yang
Xin Jing
Andreas Triantafyllopoulos
Meishu Song
Ilhan Aslan
Björn W. Schuller
20
14
0
29 Mar 2022
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Junrui Ni
Liming Wang
Heting Gao
Kaizhi Qian
Yang Zhang
Shiyu Chang
M. Hasegawa-Johnson
25
25
0
29 Mar 2022
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning
Takaaki Saeki
Kentaro Tachibana
Ryuichi Yamamoto
15
10
0
29 Mar 2022
Applying Syntax
\unicode
x
2013
\unicode{x2013}
\unicode
x
2013
Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis
Kei Furukawa
Takeshi Kishiyama
Satoshi Nakamura
11
1
0
29 Mar 2022
vTTS: visual-text to speech
Yoshifumi Nakano
Takaaki Saeki
Shinnosuke Takamichi
Katsuhito Sudoh
Hiroshi Saruwatari
25
4
0
28 Mar 2022
Attacker Attribution of Audio Deepfakes
Nicolas Müller
Franziska Dieckmann
Jennifer Williams
17
13
0
28 Mar 2022
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Rishabh Jain
Mariam Yiwere
Dan Bigioi
Peter Corcoran
H. Cucu
27
14
0
22 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
16
2
0
21 Mar 2022
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
Zewang Zhang
Yibin Zheng
Xinhui Li
Li Lu
26
16
0
21 Mar 2022
ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Jinlong Xue
Yayue Deng
Yichen Han
Ya Li
Jianqing Sun
Jiaen Liang
12
8
0
20 Mar 2022
Improve few-shot voice cloning using multi-modal learning
Haitong Zhang
Yue Lin
21
8
0
18 Mar 2022
DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation
Yichao Yan
Zanwei Zhou
Zi Wang
Chen-Ning Yang
Xiaokang Yang
CVBM
21
19
0
15 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
33
29
0
07 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning
Qingyu Xing
Xiaohan Ma
26
0
0
06 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Haohan Guo
Hui Lu
Xixin Wu
Helen Meng
185
7
0
02 Mar 2022
Real time spectrogram inversion on mobile phone
Oleg Rybakov
Marco Tagliasacchi
Yunpeng Li
Liyang Jiang
Xia Zhang
Fadi Biadsy
34
4
0
01 Mar 2022
Revisiting Over-Smoothness in Text to Speech
Yi Ren
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
87
61
0
26 Feb 2022
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech
Bo Zhao
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
DiffM
26
22
0
22 Feb 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing
Tao Wang
Jiangyan Yi
Ruibo Fu
J. Tao
Zhengqi Wen
KELM
27
18
0
21 Feb 2022
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLM
HAI
32
88
0
18 Feb 2022
ADD 2022: the First Audio Deep Synthesis Detection Challenge
Jiangyan Yi
Ruibo Fu
J. Tao
Shuai Nie
Haoxin Ma
...
Le Xu
Zhengqi Wen
Haizhou Li
Zheng Lian
Bin Liu
25
176
0
17 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech
Yi Ren
Ming Lei
Zhiying Huang
Shi-Rui Zhang
Qian Chen
Zhijie Yan
Zhou Zhao
40
41
0
16 Feb 2022
Deep Performer: Score-to-Audio Music Performance Synthesis
Hao-Wen Dong
Cong Zhou
Taylor Berg-Kirkpatrick
Julian McAuley
27
17
0
12 Feb 2022
Building Synthetic Speaker Profiles in Text-to-Speech Systems
Jie Pu
Yi Meng
Oguz H. Elibol
15
2
0
07 Feb 2022
Tubes Among Us: Analog Attack on Automatic Speaker Identification
Shimaa Ahmed
Yash R. Wani
Ali Shahin Shamsabadi
Mohammad Yaghini
Ilia Shumailov
Nicolas Papernot
Kassem Fawaz
AAML
46
4
0
06 Feb 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Dan Su
Dong Yu
DiffM
75
65
0
28 Jan 2022
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition
M. Soleymanpour
Michael T. Johnson
Rahim Soleymanpour
J. Berry
42
28
0
27 Jan 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention
Artem Gorodetskii
Ivan Ozhiganov
30
2
0
25 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
75
18
0
24 Jan 2022
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
J. Yang
Lei He
36
11
0
20 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Yu Wang
Xinsheng Wang
Pengcheng Zhu
Jie Wu
Hanzhao Li
Heyang Xue
Yongmao Zhang
Lei Xie
Mengxiao Bi
25
97
0
19 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
32
73
0
17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
70
54
0
10 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
48
207
0
07 Jan 2022
Audio representations for deep learning in sound synthesis: A review
Anastasia Natsiou
Seán O'Leary
AI4TS
30
18
0
07 Jan 2022
A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram
Anastasia Natsiou
Seán O'Leary
25
3
0
07 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios
Qicong Xie
Tao Li
Xinsheng Wang
Zhichao Wang
Lei Xie
Guoqiao Yu
Guanglu Wan
37
11
0
23 Dec 2021
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
31
143
0
15 Dec 2021
VocBench: A Neural Vocoder Benchmark for Speech Synthesis
Ehab A. AlBadawy
Andrew Gibiansky
Qing He
Jilong Wu
Ming-Ching Chang
Siwei Lyu
32
12
0
06 Dec 2021
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
24
0
25 Nov 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance
Heeseung Kim
Sungwon Kim
Sungroh Yoon
DiffM
BDL
19
107
0
23 Nov 2021
Previous
1
2
3
...
7
8
9
...
15
16
17
Next