Title
Neural voice cloning with a few low-quality samples Sunghee Jung Hoi-Rim Kim 33 2 0 12 Jun 2020
Deep generative models for musical audio synthesis M. Huzaifah L. Wyse 27 20 0 10 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 60 1,357 0 08 Jun 2020
Many-to-Many Voice Transformer Network Hirokazu Kameoka Wen-Chin Huang Kou Tanaka Takuhiro Kaneko Nobukatsu Hojo T. Toda ViT 30 30 0 18 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis Rafael Valle Kevin J. Shih R. Prenger Bryan Catanzaro 21 119 0 12 May 2020
Jukebox: A Generative Model for Music Prafulla Dhariwal Heewoo Jun Christine Payne Jong Wook Kim Alec Radford Ilya Sutskever VLM 28 722 0 30 Apr 2020
Direct Speech-to-image Translation Jiguo Li Xinfeng Zhang Chuanmin Jia Jizheng Xu Li Zhang Y. Wang Siwei Ma Wen Gao 36 29 0 07 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis Ting-Yao Hu A. Shrivastava Oncel Tuzel C. Dhir 11 30 0 09 Mar 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment Zhen Zeng Jianzong Wang Ning Cheng Tian Xia Jing Xiao VLM 25 56 0 04 Mar 2020
Semi-Supervised Neural Architecture Search Renqian Luo Xu Tan Rui Wang Tao Qin Enhong Chen Tie-Yan Liu 13 88 0 24 Feb 2020
Vision-Infused Deep Audio Inpainting Hang Zhou Ziwei Liu Lingfeng Guo Ping Luo Dahua Lin 35 88 0 24 Oct 2019
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 235 239 0 25 Sep 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck Shuang Ma Daniel J. McDuff Yale Song 25 22 0 19 Aug 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach Noé Tits 16 10 0 05 Jul 2019
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features Zexin Cai Yaogen Yang Chuxiong Zhang Xiaoyi Qin Ming Li 27 26 0 03 Jul 2019
Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models Wei Fang Yu-An Chung James R. Glass 13 27 0 17 Jun 2019
Non-Autoregressive Neural Text-to-Speech Kainan Peng Ming-Yu Liu Z. Song Kexin Zhao 29 39 0 21 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition Yi Ren Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 44 101 0 13 May 2019
A Light Dual-Task Neural Network for Haze Removal Yu Zhang Xinchao Wang Xiaojun Bi Dacheng Tao 31 13 0 12 Apr 2019
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation Ryuichi Yamamoto Eunwoo Song Jae-Min Kim 19 55 0 09 Apr 2019
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet J. Valin Jan Skoglund 24 78 0 28 Mar 2019
Securing Voice-driven Interfaces against Fake (Cloned) Audio Attacks Hafiz Malik 13 26 0 18 Feb 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers A. Koliousis Pijika Watcharapichat Matthias Weidlich Luo Mai Paolo Costa Peter R. Pietzuch 16 69 0 08 Jan 2019
Learning pronunciation from a foreign language in speech synthesis networks Younggun Lee Suwon Shon Taesu Kim 20 26 0 23 Nov 2018
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention Bajibabu Bollepalli Lauri Juvela P. Alku 15 4 0 29 Oct 2018
Sample Efficient Adaptive Text-to-Speech Yutian Chen Yannis Assael Brendan Shillingford David Budden Scott E. Reed ... Ben Laurie Çağlar Gülçehre Aaron van den Oord Oriol Vinyals Nando de Freitas 35 149 0 27 Sep 2018
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks Sercan Ö. Arik Heewoo Jun G. Diamos 14 106 0 20 Aug 2018
Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions Yu Gu Yongguo Kang 10 17 0 22 Jun 2018
Voice Imitating Text-to-Speech Neural Networks Younggun Lee Taesu Kim Soo-Young Lee 26 11 0 04 Jun 2018
Collapsed speech segment detection and suppression for WaveNet vocoder Yi-Chiao Wu Kazuhiro Kobayashi Tomoki Hayashi Patrick Lumban Tobing T. Toda 7 25 0 30 Apr 2018
Speaker-independent raw waveform model for glottal excitation Lauri Juvela Vassilis Tsiaras Bajibabu Bollepalli Manu Airaksinen Junichi Yamagishi P. Alku 13 39 0 25 Apr 2018
Conditional End-to-End Audio Transforms Albert Haque Michelle Guo Prateek Verma 33 41 0 30 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 21 815 0 23 Mar 2018
Efficient Neural Audio Synthesis Nal Kalchbrenner Erich Elsen Karen Simonyan Seb Noury Norman Casagrande Edward Lockhart Florian Stimberg Aaron van den Oord Sander Dieleman Koray Kavukcuoglu 23 863 0 23 Feb 2018
Do WaveNets Dream of Acoustic Waves? Kanru Hua 24 1 0 23 Feb 2018
Fitting New Speakers Based on a Short Untranscribed Sample Eliya Nachmani Adam Polyak Yaniv Taigman Lior Wolf 21 84 0 20 Feb 2018
Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension Zhenhua Ling Yang Ai Yu Gu Lirong Dai 16 61 0 24 Jan 2018
Denoising Gravitational Waves using Deep Learning with Recurrent Denoising Autoencoders Hongyu Shen D. George Eliu A. Huerta Zhizhen Zhao 32 66 0 27 Nov 2017
Listening while Speaking: Speech Chain by Deep Learning Andros Tjandra S. Sakti Satoshi Nakamura AuLLM 126 165 0 16 Jul 2017
Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini Hieu H. Pham Quoc V. Le Benoit Steiner Rasmus Larsen Yuefeng Zhou Naveen Kumar Mohammad Norouzi Samy Bengio J. Dean 27 436 0 13 Jun 2017
Deep Voice 2: Multi-Speaker Neural Text-to-Speech Sercan Ö. Arik G. Diamos Andrew Gibiansky John Miller Kainan Peng Ming-Yu Liu Jonathan Raiman Yanqi Zhou 22 494 0 24 May 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 47 1,804 0 29 Mar 2017
Pixel Recurrent Neural Networks Aaron van den Oord Nal Kalchbrenner Koray Kavukcuoglu SSeg GAN 269 2,552 0 25 Jan 2016
Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion Kaisheng Yao Geoffrey Zweig 48 163 0 31 May 2015