Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1703.10135
Cited By
Tacotron: Towards End-to-End Speech Synthesis
29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tacotron: Towards End-to-End Speech Synthesis"
50 / 817 papers shown
Title
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
28
0
0
07 Nov 2023
AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
Hsin-Min Wang
34
5
0
05 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
36
27
0
02 Nov 2023
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
13
3
0
22 Oct 2023
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
ViT
53
6
0
19 Oct 2023
Recasting Continual Learning as Sequence Modeling
Soochan Lee
Jaehyeon Son
Gunhee Kim
CLL
25
9
0
18 Oct 2023
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling
Tiberiu Boros
Stefan Daniel Dumitrescu
Ionut Mironica
Radu Chivereanu
GAN
19
1
0
14 Oct 2023
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
27
1
0
10 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
24
0
0
08 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
35
3
0
06 Oct 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
41
3
0
06 Oct 2023
ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
Wenhao Guan
Qi Su
Haodong Zhou
Shiyu Miao
Xingjia Xie
Lin Li
Q. Hong
DiffM
20
13
0
29 Sep 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Xin Wang
Taein Kwon
Wei-Ning Hsu
Yossi Adi
Tu Nguyen
D. Bohus
Emmanuel Dupoux
Neel Joshi
Abdelrahman Mohamed
15
4
0
29 Sep 2023
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models
Chunyu Qiang
Hao Li
Yixin Tian
Yi Zhao
Ying Zhang
Longbiao Wang
Jianwu Dang
DiffM
41
2
0
27 Sep 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
Yuchen Liu
Apu Kapadia
Donald Williamson
AAML
44
0
0
26 Sep 2023
BiSinger: Bilingual Singing Voice Synthesis
Huali Zhou
Yueqian Lin
Yao Shi
Peng Sun
Ming Li
25
5
0
25 Sep 2023
HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS
Dake Guo
Xinfa Zhu
Liumeng Xue
Tao Li
Yuanjun Lv
Yuepeng Jiang
Linfu Xie
22
1
0
25 Sep 2023
The Impact of Silence on Speech Anti-Spoofing
Yuxiang Zhang
Zhuo Li
Jingze Lu
Hua Hua
Wenchao Wang
Pengyuan Zhang
40
19
0
21 Sep 2023
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
Jianzong Wang
Xulong Zhang
Aolan Sun
Ning Cheng
Jing Xiao
39
1
0
16 Sep 2023
Large-Scale Automatic Audiobook Creation
Brendan Walsh
Mark Hamilton
Greg Newby
Xi Wang
Serena Ruan
...
Lei He
Shaofei Zhang
Eric Dettinger
William T. Freeman
Markus Weimer
31
1
0
07 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
23
2
0
06 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLM
DiffM
52
41
0
05 Sep 2023
Timbre-reserved Adversarial Attack in Speaker Identification
Qing Wang
Jixun Yao
Li Zhang
Pengcheng Guo
Linfu Xie
AAML
37
4
0
02 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
41
8
0
02 Sep 2023
The FruitShell French synthesis system at the Blizzard 2023 Challenge
Xin Qi
Xiaopeng Wang
Zhiyong Wang
Wang Liu
Mingming Ding
Shuchen Shi
15
1
0
01 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
43
3
0
31 Aug 2023
Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis
Linsen Song
Wayne Wu
Chaoyou Fu
Chen Change Loy
Ran He
31
10
0
31 Aug 2023
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis
Weiqin Li
Shunwei Lei
Qiaochu Huang
Yixuan Zhou
Zhiyong Wu
Shiyin Kang
Helen Meng
27
4
0
31 Aug 2023
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Jing Chen
Xingcheng Song
Zhendong Peng
Binbin Zhang
Fuping Pan
Zhiyong Wu
DiffM
21
16
0
31 Aug 2023
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Haibo Wang
Shiwan Zhao
Xiguang Zheng
Yong Qin
29
11
0
31 Aug 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Yi Meng
Xiang Li
Zhiyong Wu
Tingtian Li
Zixun Sun
Xinyu Xiao
Chi Sun
Hui Zhan
Helen Meng
14
0
0
30 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
35
38
0
24 Aug 2023
Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion
Jordan J. Bird
Ahmad Lotfi
13
16
0
24 Aug 2023
CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making
Giovanni Minelli
Mirco Musolesi
33
0
0
21 Aug 2023
Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models
Heyang Xue
Shuai Guo
Pengcheng Zhu
Mengxiao Bi
DiffM
40
1
0
21 Aug 2023
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software
Wenxuan Wang
Jingyuan Huang
Jen-tse Huang
Chang Chen
Jiazhen Gu
Pinjia He
Michael R. Lyu
VLM
36
6
0
18 Aug 2023
Long-frame-shift Neural Speech Phase Prediction with Spectral Continuity Enhancement and Interpolation Error Compensation
Yang Ai
Ye-Xin Lu
Zhenhua Ling
26
5
0
17 Aug 2023
Accurate synthesis of Dysarthric Speech for ASR data augmentation
M. Soleymanpour
Michael T. Johnson
Rahim Soleymanpour
J. Berry
21
2
0
16 Aug 2023
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation
Zhichao Wang
M. Dai
Keld Lundgaard
VGen
DiffM
45
2
0
12 Aug 2023
SAPIEN: Affective Virtual Agents Powered by Large Language Models
Masum Hasan
Cengiz Ozel
Sammy Potter
E. Hoque
VLM
LLMAG
27
7
0
06 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
20
1
0
03 Aug 2023
Music De-limiter Networks via Sample-wise Gain Inversion
Chang-Bin Jeon
Kyogu Lee
16
1
0
02 Aug 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
28
14
0
31 Jul 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
35
7
0
29 Jul 2023
All-for-One and One-For-All: Deep learning-based feature fusion for Synthetic Speech Detection
Daniele Mari
Davide Salvi
Paolo Bestagini
Simone Milani
11
5
0
28 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
WavJourney: Compositional Audio Creation with Large Language Models
Xubo Liu
Zhongkai Zhu
Haohe Liu
Yiitan Yuan
Meng Cui
...
Jinhua Liang
Yin Cao
Qiuqiang Kong
Mark D. Plumbley
Wenwu Wang
AuLLM
34
25
0
26 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
25
2
0
20 Jul 2023
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Liangyu Zha
Junlin Zhou
Liyao Li
Rui Wang
Qingyi Huang
...
Xing-yan Deng
J. Xu
Haobo Wang
Gang Chen
Jun Zhao
RALM
LMTD
32
42
0
17 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
43
44
0
14 Jul 2023
Previous
1
2
3
4
5
6
...
15
16
17
Next