v1v2 (latest)

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

25 October 2019

Papers citing "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram"

50 / 464 papers shown

Title
Speech Enhancement for Wake-Up-Word detection in Voice Assistants David Bonet Guillermo Cámbara Fernando López Pablo Gómez Carlos Segura Jordi Luque 62 11 0 29 Jan 2021
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss Eunwoo Song Ryuichi Yamamoto Min-Jae Hwang Jin-Seob Kim Ohsung Kwon Jae-Min Kim 61 14 0 19 Jan 2021
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans Shinji Watanabe Florian Boyer Xuankai Chang Pengcheng Guo Tomoki Hayashi ... Shigeki Karita Chenda Li Jing Shi Aswin Shanmugam Subramanian Wangyou Zhang VLM 108 38 0 23 Dec 2020
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling Chen Zhang Yi Ren Xu Tan Jinglin Liu Ke-jun Zhang Tao Qin Sheng Zhao Tie-Yan Liu DiffM 97 38 0 17 Dec 2020
I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch Joseph P. Turian Max Henry 49 31 0 08 Dec 2020
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training Haohan Guo Heng Lu Na Hu Chunlei Zhang Shan Yang Lei Xie Jane Polak Scowcroft Dong Yu AAML 68 12 0 03 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution Zhen Zeng Jianzong Wang Ning Cheng Jing Xiao 44 8 0 03 Dec 2020
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains Won Jang D. Lim Jaesam Yoon 60 34 0 19 Nov 2020
Single channel voice separation for unknown number of speakers under reverberant and noisy settings Shlomo E. Chazan Lior Wolf Eliya Nachmani Yossi Adi 78 29 0 04 Nov 2020
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion Disong Wang Songxiang Liu Lifa Sun Xixin Wu Xunying Liu Helen Meng 30 9 0 03 Nov 2020
StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization Ahmed Mustafa N. Pia Guillaume Fuchs 91 73 0 03 Nov 2020
Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech Yeunju Choi Youngmoon Jung Youngjoo Suh Hoirin Kim 125 6 0 02 Nov 2020
CVC: Contrastive Learning for Non-parallel Voice Conversion Tingle Li Yichen Liu Chenxu Hu Hang Zhao DRL 100 13 0 02 Nov 2020
Speech Synthesis and Control Using Differentiable DSP Giorgio Fabbro Vladimir Golkov Thomas Kemp Zorah Lähner 78 12 0 28 Oct 2020
Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators Ryuichi Yamamoto Eunwoo Song Min-Jae Hwang Jae-Min Kim 74 18 0 27 Oct 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer Pengcheng Guo Florian Boyer Xuankai Chang Tomoki Hayashi Yosuke Higuchi ... Jing Shi Shinji Watanabe Kun Wei Wangyou Zhang Yuekai Zhang 89 263 0 26 Oct 2020
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis Min-Jae Hwang Ryuichi Yamamoto Eunwoo Song Jae-Min Kim 44 32 0 26 Oct 2020
Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations Wen-Chin Huang Yi-Chiao Wu Tomoki Hayashi Tomoki Toda BDL 111 38 0 23 Oct 2020
NU-GAN: High resolution neural upsampling with GAN Rithesh Kumar Kundan Kumar Vicki Anand Yoshua Bengio Aaron Courville 65 26 0 22 Oct 2020
BERT for Joint Multichannel Speech Dereverberation with Spatial-aware Tasks Yang Jiao 29 0 0 21 Oct 2020
Automatic multitrack mixing with a differentiable mixing console of neural audio effects C. Steinmetz Jordi Pons Santiago Pascual Joan Serrà 113 51 0 20 Oct 2020
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training Renjie Zheng Mingbo Ma Baigong Zheng Kaibo Liu Jiahong Yuan Kenneth Church Liang Huang 49 14 0 20 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 181 1,954 0 12 Oct 2020
The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders Wen-Chin Huang Patrick Lumban Tobing Yi-Chiao Wu Kazuhiro Kobayashi Tomoki Toda 79 8 0 09 Oct 2020
Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN Patrick Lumban Tobing Yi-Chiao Wu Tomoki Toda DRL 55 14 0 09 Oct 2020
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics Hirokazu Kameoka Takuhiro Kaneko Kou Tanaka Nobukatsu Hojo Shogo Seki DiffM 124 21 0 06 Oct 2020
The Academia Sinica Systems of Voice Conversion for VCC2020 Yu-Huai Peng Cheng-Hung Hu A. Kang Hung-Shin Lee Pin-Yuan Chen Yu Tsao Hsin-Min Wang 46 2 0 06 Oct 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS Wen-Chin Huang Tomoki Hayashi Shinji Watanabe Tomoki Toda DRL 81 40 0 06 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis Zhifeng Kong Ming-Yu Liu Jiaji Huang Kexin Zhao Bryan Catanzaro DiffM BDL 216 1,471 0 21 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis Jiawei Chen Xu Tan Jian Luan Tao Qin Tie-Yan Liu VLM 102 93 0 03 Sep 2020
WaveGrad: Estimating Gradients for Waveform Generation Nanxin Chen Yu Zhang Heiga Zen Ron J. Weiss Mohammad Norouzi William Chan DiffM BDL 149 795 0 02 Sep 2020
Hierarchical Timbre-Painting and Articulation Generation Michael Michelashvili Lior Wolf 73 12 0 30 Aug 2020
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion Yi Zhao Wen-Chin Huang Xiaohai Tian Junichi Yamagishi Rohan Kumar Das Tomi Kinnunen Zhenhua Ling Tomoki Toda 88 211 0 28 Aug 2020
Nonparallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks Hirokazu Kameoka Takuhiro Kaneko Kou Tanaka Nobukatsu Hojo 99 20 0 27 Aug 2020
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder Hyun-Wook Yoon Sang-Hoon Lee Hyeong-Rae Noh Seong-Whan Lee 108 11 0 16 Aug 2020
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems Ravichander Vipperla Sangjun Park Kihyun Choo Samin S. Ishtiaq Kyoungbo Min S. Bhattacharya Abhinav Mehrotra Alberto Gil C. P. Ramos Nicholas D. Lane 72 26 0 11 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition Jin Xu Xu Tan Yi Ren Tao Qin Jian Li Sheng Zhao Tie-Yan Liu VLM 70 91 0 09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning Berrak Sisman Junichi Yamagishi Simon King Haizhou Li BDL 139 329 0 09 Aug 2020
Pretraining Techniques for Sequence-to-Sequence Voice Conversion Wen-Chin Huang Tomoki Hayashi Yi-Chiao Wu Hirokazu Kameoka Tomoki Toda 118 40 0 07 Aug 2020
Unsupervised Cross-Domain Singing Voice Conversion Adam Polyak Lior Wolf Yossi Adi Yaniv Taigman 58 44 0 06 Aug 2020
HooliGAN: Robust, High Quality Neural Vocoding Ollie McCarthy Zo Ahmed 95 14 0 06 Aug 2020
A Spectral Energy Distance for Parallel Speech Synthesis A. Gritsenko Tim Salimans Rianne van den Berg Jasper Snoek Nal Kalchbrenner 71 70 0 03 Aug 2020
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network Jinhyeok Yang Junmo Lee Young-Ik Kim Hoonyoung Cho Injung Kim 82 73 0 30 Jul 2020
Translate Reverberated Speech to Anechoic Ones: Speech Dereverberation with BERT Yang Jiao 40 1 0 16 Jul 2020
Real Time Speech Enhancement in the Waveform Domain Alexandre Défossez Gabriel Synnaeve Yossi Adi 109 466 0 23 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 155 1,415 0 08 Jun 2020
End-to-End Adversarial Text-to-Speech Jeff Donahue Sander Dieleman Mikolaj Binkowski Erich Elsen Karen Simonyan 85 187 0 05 Jun 2020
An ASR Guided Speech Intelligibility Measure for TTS Model Selection Arun Baby Saranya Vinnaitherthan Nagaraj Adiga Pranav Jawale Sumukh Badam Sharath Adavanne Srikanth Konjeti 40 7 0 02 Jun 2020
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation Yi-Chiao Wu Tomoki Hayashi T. Okamoto Hisashi Kawai Tomoki Toda 73 4 0 18 May 2020
MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search Naihan Li Shujie Liu Yanqing Liu Sheng Zhao Ming-Yuan Liu Ming Zhou 50 6 0 18 May 2020