Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.09407
Cited By
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
17 November 2022
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
Re-assign community
ArXiv
PDF
HTML
Papers citing
"NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis"
45 / 45 papers shown
Title
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
Hyeongju Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
113
0
0
29 Mar 2025
Creating New Voices using Normalizing Flows
Piotr Bilinski
Thomas Merritt
Abdelhamid Ezzerg
Kamil Pokora
Sebastian Cygert
K. Yanagisawa
Roberto Barra-Chicote
Daniel Korzekwa
38
16
0
22 Dec 2023
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
53
112
0
20 Apr 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak
Piotr Dura
Pol van Rijn
Nori Jacoby
AI4TS
84
30
0
31 Mar 2022
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Sunghwan Ahn
Joun Yeop Lee
N. Kim
96
26
0
29 Mar 2022
Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Dacheng Yin
Xuanchi Ren
Chong Luo
Yuwang Wang
Zhiwei Xiong
Wenjun Zeng
87
13
0
24 Feb 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
64
23
0
25 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Yu Wang
Xinsheng Wang
Pengcheng Zhu
Jie Wu
Hanzhao Li
Heyang Xue
Yongmao Zhang
Lei Xie
Mengxiao Bi
51
100
0
19 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
217
403
0
04 Dec 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
84
699
0
17 Nov 2021
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
38
28
0
07 Nov 2021
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
Hyeong-Seok Choi
Juheon Lee
W. Kim
Jie Hwan Lee
Hoon Heo
Kyogu Lee
61
156
0
27 Oct 2021
Chunked Autoregressive GAN for Conditional Waveform Synthesis
Max Morrison
Rithesh Kumar
Kundan Kumar
Prem Seetharaman
Aaron Courville
Yoshua Bengio
GAN
89
70
0
19 Oct 2021
ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection
Junichi Yamagishi
Xin Wang
Massimiliano Todisco
Md. Sahidullah
J. Patino
...
Xuechen Liu
Kong Aik Lee
Tomi Kinnunen
Nicholas W. D. Evans
Héctor Delgado
61
345
0
01 Sep 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
145
2,939
0
14 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
114
878
0
11 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
96
173
0
06 Jun 2021
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Jinglin Liu
Chengxi Li
Yi Ren
Feiyang Chen
Zhou Zhao
DiffM
84
264
0
06 May 2021
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations
Jheng-hao Lin
Yist Y. Lin
C. Chien
Hung-yi Lee
106
56
0
07 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Myeonghun Jeong
Hyeongju Kim
Sung Jun Cheon
Byoung Jin Choi
N. Kim
DiffM
59
196
0
03 Apr 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
157
1,007
0
04 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
75
191
0
01 Mar 2021
PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components
Yukiya Hono
Shinji Takaki
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
48
16
0
15 Feb 2021
Real-time Denoising and Dereverberation with Tiny Recurrent U-Net
Hyeong-Seok Choi
Sungjin Park
Jie Hwan Lee
Hoon Heo
Dongsuk Jeon
Kyogu Lee
57
57
0
05 Feb 2021
I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch
Joseph P. Turian
Max Henry
36
30
0
08 Dec 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
51
99
0
06 Nov 2020
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention
Yist Y. Lin
C. Chien
Jheng-hao Lin
Hung-yi Lee
Lin-Shan Lee
45
78
0
27 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
162
1,923
0
12 Oct 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
59
93
0
03 Sep 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
219
5,767
0
20 Jun 2020
End-to-End Adversarial Text-to-Speech
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
64
186
0
05 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
79
490
0
22 May 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Brecht Desplanques
Jenthe Thienpondt
Kris Demuynck
72
1,328
0
14 May 2020
DDSP: Differentiable Digital Signal Processing
Jesse Engel
Lamtharn Hantrakul
Chenjie Gu
Adam Roberts
DiffM
143
378
0
14 Jan 2020
SPICE: Self-supervised Pitch Estimation
Beat Gfeller
Christian Frank
Dominik Roblek
Matthew Sharifi
Marco Tagliasacchi
Mihajlo Velimirović
34
77
0
25 Oct 2019
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
54
818
0
25 Oct 2019
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
274
240
0
25 Sep 2019
Adversarially Trained End-to-end Korean Singing Voice Synthesis System
Juheon Lee
Hyeong-Seok Choi
Chang-Bin Jeon
Junghyun Koo
Kyogu Lee
32
77
0
06 Aug 2019
Neural source-filter waveform models for statistical parametric speech synthesis
Xin Wang
Shinji Takaki
Junichi Yamagishi
66
118
0
27 Apr 2019
Attentive Statistics Pooling for Deep Speaker Embedding
K. Okabe
Takafumi Koshinaka
Koichi Shinoda
87
530
0
29 Mar 2018
CREPE: A Convolutional Representation for Pitch Estimation
Jong Wook Kim
Justin Salamon
P. Li
J. P. Bello
49
378
0
17 Feb 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
622
130,942
0
12 Jun 2017
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
...
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
153
1,819
0
29 Mar 2017
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
212
2,391
0
23 Dec 2016
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.4K
149,842
0
22 Dec 2014
1