Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.08947
Cited By
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
24 May 2017
Sercan Ö. Arik
G. Diamos
Andrew Gibiansky
John Miller
Kainan Peng
Ming-Yu Liu
Jonathan Raiman
Yanqi Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Voice 2: Multi-Speaker Neural Text-to-Speech"
50 / 87 papers shown
Title
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
55
2
0
16 Oct 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
183
0
0
30 Aug 2024
Speech as Interactive Design Material (SIDM): How to design and evaluate task-tailored synthetic voices?
Mateusz Dubiel
M. Aylett
Anuschka Schmitt
Zilin Ma
Gary Hsieh
Thiemo Wambsganss
23
0
0
26 Feb 2024
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
41
8
0
02 Sep 2023
An analysis on the effects of speaker embedding choice in non auto-regressive TTS
Adriana Stan
Johannah O'Mahony
39
0
0
19 Jul 2023
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
K. Lakshminarayana
C. Dittmar
N. Pia
Emanuel Habets
34
0
0
16 Jun 2023
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
35
0
0
12 May 2023
Do Prosody Transfer Models Transfer Prosody?
A. Sigurgeirsson
Simon King
DiffM
12
7
0
07 Mar 2023
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
45
2
0
26 Nov 2022
Contextual Expressive Text-to-Speech
Jianhong Tu
Zeyu Cui
Xiaohuan Zhou
Siqi Zheng
Kaiqin Hu
Ju Fan
Chang Zhou
17
2
0
26 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
38
18
0
17 Nov 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Yifan Hu
Rui Liu
Guanglai Gao
Haizhou Li
122
7
0
27 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
20
53
0
06 Oct 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Yusuke Nakai
Yuki Saito
K. Udagawa
Hiroshi Saruwatari
AAML
25
1
0
26 Sep 2022
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
34
6
0
22 Sep 2022
Visualising Model Training via Vowel Space for Text-To-Speech Systems
Binu Abeysinghe
Jesin James
C. Watson
Felix Marattukalam
26
2
0
21 Aug 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Yinjiao Lei
Shan Yang
Jian Cong
Linfu Xie
Dan Su
DiffM
52
12
0
05 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Guangyan Zhang
Ying Qin
Wenbo Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
50
26
0
29 Jun 2022
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
68
0
0
28 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
44
213
0
09 May 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
VLM
DiffM
14
67
0
01 Apr 2022
Variational Auto-Encoder based Mandarin Speech Cloning
Qingyu Xing
Xiaohan Ma
21
0
0
06 Mar 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Dan Su
Dong Yu
DiffM
70
65
0
28 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
68
18
0
24 Jan 2022
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
28
142
0
15 Dec 2021
How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey
Zahra Khanjani
Gabrielle Watson
V. P Janeja
25
25
0
28 Nov 2021
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
23
0
25 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
18
56
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
12
17
0
07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
129
123
0
04 Nov 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Haitong Zhang
Yue Lin
15
0
0
14 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
26
2
0
06 Oct 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
28
36
0
29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
26
3
0
21 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
25
160
0
06 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
18
55
0
24 May 2021
Speaker disentanglement in video-to-speech conversion
Dan Oneaţă
Adriana Stan
H. Cucu
24
9
0
20 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
21
8
0
16 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
45
81
0
28 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
37
187
0
01 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
42
22
0
12 Feb 2021
Universal Neural Vocoding with Parallel WaveNet
Yunlong Jiao
Adam Gabry's
Georgi Tinchev
Bartosz Putrycz
Daniel Korzekwa
V. Klimkov
36
42
0
01 Feb 2021
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts
Thomas Merritt
Giulia Comini
Bartek Perz
Raahil Shah
Jaime Lorenzo-Trueba
26
50
0
11 Nov 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
30
219
0
22 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
34
1,392
0
21 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
19
92
0
03 Sep 2020
Adversarial representation learning for private speech generation
David Ericsson
Adam Östberg
Edvin Listo Zec
John Martinsson
Olof Mogren
27
16
0
16 Jun 2020
1
2
Next