NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

17 November 2022

Hyeongju Kim

Papers citing "NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis"

45 / 45 papers shown

Title
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System Hyeongju Kim Jinhyeok Yang Yechan Yu Seunghun Ji Jacob Morton Frederik Bous Joon Byun Juheon Lee 113 0 0 29 Mar 2025
Creating New Voices using Normalizing Flows Piotr Bilinski Thomas Merritt Abdelhamid Ezzerg Kamil Pokora Sebastian Cygert K. Yanagisawa Roberto Barra-Chicote Daniel Korzekwa 38 16 0 22 Dec 2023
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers Kaizhi Qian Yang Zhang Heting Gao Junrui Ni Cheng-I Jeff Lai David D. Cox M. Hasegawa-Johnson Shiyu Chang DRL 53 112 0 20 Apr 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis Hubert Siuzdak Piotr Dura Pol van Rijn Nori Jacoby AI4TS 84 30 0 31 Mar 2022
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus Minchan Kim Myeonghun Jeong Byoung Jin Choi Sunghwan Ahn Joun Yeop Lee N. Kim 96 26 0 29 Mar 2022
Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph Dacheng Yin Xuanchi Ren Chong Luo Yuwang Wang Zhiwei Xiong Wenjun Zeng 87 13 0 24 Feb 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training Wenyong Huang Zhenhe Zhang Y. Yeung Xin Jiang Qun Liu 64 23 0 25 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis Yu Wang Xinsheng Wang Pengcheng Zhu Jie Wu Hanzhao Li Heyang Xue Yongmao Zhang Lei Xie Mengxiao Bi 51 100 0 19 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 217 403 0 04 Dec 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale Arun Babu Changhan Wang Andros Tjandra Kushal Lakhotia Qiantong Xu ... Yatharth Saraf J. Pino Alexei Baevski Alexis Conneau Michael Auli SSL 84 699 0 17 Nov 2021
Speaker Generation Daisy Stanton Matt Shannon Soroosh Mariooryad RJ Skerry-Ryan Eric Battenberg Tom Bagby David Kao 38 28 0 07 Nov 2021
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations Hyeong-Seok Choi Juheon Lee W. Kim Jie Hwan Lee Hoon Heo Kyogu Lee 61 156 0 27 Oct 2021
Chunked Autoregressive GAN for Conditional Waveform Synthesis Max Morrison Rithesh Kumar Kundan Kumar Prem Seetharaman Aaron Courville Yoshua Bengio GAN 89 70 0 19 Oct 2021
ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection Junichi Yamagishi Xin Wang Massimiliano Todisco Md. Sahidullah J. Patino ... Xuechen Liu Kong Aik Lee Tomi Kinnunen Nicholas W. D. Evans Héctor Delgado 61 345 0 01 Sep 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Wei-Ning Hsu Benjamin Bolte Yao-Hung Hubert Tsai Kushal Lakhotia Ruslan Salakhutdinov Abdel-rahman Mohamed SSL 145 2,939 0 14 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 114 878 0 11 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dong Min Dong Bok Lee Eunho Yang Sung Ju Hwang 96 173 0 06 Jun 2021
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism Jinglin Liu Chengxi Li Yi Ren Feiyang Chen Zhou Zhao DiffM 84 264 0 06 May 2021
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations Jheng-hao Lin Yist Y. Lin C. Chien Hung-yi Lee 106 56 0 07 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech Myeonghun Jeong Hyeongju Kim Sung Jun Cheon Byoung Jin Choi N. Kim DiffM 59 196 0 03 Apr 2021
Perceiver: General Perception with Iterative Attention Andrew Jaegle Felix Gimeno Andrew Brock Andrew Zisserman Oriol Vinyals João Carreira VLM ViT MDE 157 1,007 0 04 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice Mingjian Chen Xu Tan Bohan Li Yanqing Liu Tao Qin Sheng Zhao Tie-Yan Liu VLM DiffM 75 191 0 01 Mar 2021
PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components Yukiya Hono Shinji Takaki Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku K. Tokuda 48 16 0 15 Feb 2021
Real-time Denoising and Dereverberation with Tiny Recurrent U-Net Hyeong-Seok Choi Sungjin Park Jie Hwan Lee Hoon Heo Dongsuk Jeon Kyogu Lee 57 57 0 05 Feb 2021
I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch Joseph P. Turian Max Henry 36 30 0 08 Dec 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 51 99 0 06 Nov 2020
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention Yist Y. Lin C. Chien Jheng-hao Lin Hung-yi Lee Lin-Shan Lee 45 78 0 27 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 162 1,923 0 12 Oct 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis Jiawei Chen Xu Tan Jian Luan Tao Qin Tie-Yan Liu VLM 59 93 0 03 Sep 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Alexei Baevski Henry Zhou Abdel-rahman Mohamed Michael Auli SSL 219 5,767 0 20 Jun 2020
End-to-End Adversarial Text-to-Speech Jeff Donahue Sander Dieleman Mikolaj Binkowski Erich Elsen Karen Simonyan 64 186 0 05 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search Jaehyeon Kim Sungwon Kim Jungil Kong Sungroh Yoon 79 490 0 22 May 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification Brecht Desplanques Jenthe Thienpondt Kris Demuynck 72 1,328 0 14 May 2020
DDSP: Differentiable Digital Signal Processing Jesse Engel Lamtharn Hantrakul Chenjie Gu Adam Roberts DiffM 143 378 0 14 Jan 2020
SPICE: Self-supervised Pitch Estimation Beat Gfeller Christian Frank Dominik Roblek Matthew Sharifi Marco Tagliasacchi Mihajlo Velimirović 34 77 0 25 Oct 2019
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram Ryuichi Yamamoto Eunwoo Song Jae-Min Kim 54 818 0 25 Oct 2019
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 274 240 0 25 Sep 2019
Adversarially Trained End-to-end Korean Singing Voice Synthesis System Juheon Lee Hyeong-Seok Choi Chang-Bin Jeon Junghyun Koo Kyogu Lee 32 77 0 06 Aug 2019
Neural source-filter waveform models for statistical parametric speech synthesis Xin Wang Shinji Takaki Junichi Yamagishi 66 118 0 27 Apr 2019
Attentive Statistics Pooling for Deep Speaker Embedding K. Okabe Takafumi Koshinaka Koichi Shinoda 87 530 0 29 Mar 2018
CREPE: A Convolutional Representation for Pitch Estimation Jong Wook Kim Justin Salamon P. Li J. P. Bello 49 378 0 17 Feb 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 622 130,942 0 12 Jun 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 153 1,819 0 29 Mar 2017
Language Modeling with Gated Convolutional Networks Yann N. Dauphin Angela Fan Michael Auli David Grangier 212 2,391 0 23 Dec 2016
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.4K 149,842 0 22 Dec 2014