Title
Pretraining Techniques for Sequence-to-Sequence Voice Conversion Wen-Chin Huang Tomoki Hayashi Yi-Chiao Wu Hirokazu Kameoka Tomoki Toda 27 38 0 07 Aug 2020
Peking Opera Synthesis via Duration Informed Attention Network Yusong Wu Shengchen Li Chengzhu Yu Heng Lu Chao Weng Liqiang Zhang Dong Yu 16 10 0 07 Aug 2020
DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System Liqiang Zhang Chengzhu Yu Heng Lu Chao Weng Chunlei Zhang Yusong Wu Xiang Xie Zijin Li Dong Yu 30 34 0 07 Aug 2020
HooliGAN: Robust, High Quality Neural Vocoding Ollie McCarthy Zo Ahmed 24 14 0 06 Aug 2020
PPSpeech: Phrase based Parallel End-to-End TTS System Yahuan Cong Ran Zhang Jian Luan 34 3 0 06 Aug 2020
Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning Jing-Xuan Zhang Zhenhua Ling Lirong Dai 15 6 0 05 Aug 2020
Audiovisual Speech Synthesis using Tacotron2 Ahmed Hussen Abdelaziz Anushree Prasanna Kumar Chloe Seivwright Gabriele Fanelli Justin Binder Y. Stylianou S. Kajarekar 20 15 0 03 Aug 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis Fengyu Yang Shan Yang Qinghua Wu Yujun Wang Lei Xie 39 5 0 03 Aug 2020
Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning Jaesung Bae Hanbin Bae Young-Sun Joo Junmo Lee Gyeong-Hoon Lee Hoon-Young Cho 16 17 0 30 Jul 2020
Xiaomingbot: A Multilingual Robot News Reporter Runxin Xu Jun Cao Mingxuan Wang Jiaze Chen Hao Zhou ... Xiang Yin Xijin Zhang Songcheng Jiang Yuxuan Wang Lei Li 23 11 0 12 Jul 2020
LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity Jordan J. Bird Diego Resende Faria Anikó Ekárt C. Premebida Pedro P. S. Ayrosa 25 5 0 01 Jul 2020
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis Antti Suni Sofoklis Kakouros M. Vainio J. Šimko 19 17 0 29 Jun 2020
Gamma Boltzmann Machine for Simultaneously Modeling Linear- and Log-amplitude Spectra Toru Nakashika Kohei Yatabe 13 0 0 24 Jun 2020
Audeo: Audio Generation for a Silent Performance Video Kun Su Xiulong Liu Eli Shlizerman VGen 33 67 0 23 Jun 2020
Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory Inversion Narjes Bozorg Michael T.Johnson 18 1 0 22 Jun 2020
Embodied Self-supervised Learning by Coordinated Sampling and Training Yifan Sun Xihong Wu SSL 30 7 0 20 Jun 2020
SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement Luka Chkhetiani Levan Bejanidze 25 1 0 13 Jun 2020
Neural voice cloning with a few low-quality samples Sunghee Jung Hoi-Rim Kim 33 2 0 12 Jun 2020
Deep generative models for musical audio synthesis M. Huzaifah L. Wyse 32 20 0 10 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer Mingjian Chen Xu Tan Yi Ren Jin Xu Hao Sun Sheng Zhao Tao Qin Tie-Yan Liu 29 109 0 08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 60 1,362 0 08 Jun 2020
End-to-End Adversarial Text-to-Speech Jeff Donahue Sander Dieleman Mikolaj Binkowski Erich Elsen Karen Simonyan 22 186 0 05 Jun 2020
PJS: phoneme-balanced Japanese singing voice corpus Junya Koguchi Shinnosuke Takamichi 20 22 0 04 Jun 2020
An ASR Guided Speech Intelligibility Measure for TTS Model Selection Arun Baby Saranya Vinnaitherthan Nagaraj Adiga Pranav Jawale Sumukh Badam Sharath Adavanne Srikanth Konjeti 9 7 0 02 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder Kazi Nazmul Haque R. Rana Björn W Schuller DRL 31 12 0 01 Jun 2020
DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices Run Wang Felix Juefei Xu Yihao Huang Qing Guo Xiaofei Xie Lei Ma Yang Liu AAML 32 105 0 28 May 2020
NAUTILUS: a Versatile Voice Cloning System Hieu-Thi Luong Junichi Yamagishi 28 51 0 22 May 2020
Conversational End-to-End TTS for Voice Agent Haohan Guo Shaofei Zhang Frank Soong Lei He Lei Xie 34 67 0 21 May 2020
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis Yusuke Yasuda Xin Wang Junichi Yamagishi AI4TS 22 31 0 20 May 2020
Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech Wenjie Li Benlai Tang Xiang Yin Yushi Zhao Wei Li Kang Wang Hao Huang Yuxuan Wang Zejun Ma 14 13 0 19 May 2020
Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition S. Latif R. Rana Sara Khalifa Raja Jurdak Björn W. Schuller 46 28 0 18 May 2020
Many-to-Many Voice Transformer Network Hirokazu Kameoka Wen-Chin Huang Kou Tanaka Takuhiro Kaneko Nobukatsu Hojo Tomoki Toda ViT 30 30 0 18 May 2020
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation Tao Tu Yuan-Jui Chen Alexander H. Liu Hung-yi Lee 33 7 0 16 May 2020
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment D. Lim Won Jang Gyeonghwan O Heayoung Park Bongwan Kim Jaesam Yoon 27 36 0 15 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation A. Laptev Roman Korostik A. Svischev A. Andrusenko Ivan Medennikov S. Rybin 16 61 0 14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis Rafael Valle Kevin J. Shih R. Prenger Bryan Catanzaro 28 119 0 12 May 2020
AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN Zewang Zhang Qiao Tian Heng Lu Ling-Hao Chen Shan Liu 9 27 0 12 May 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem Tomoki Hayashi Shinji Watanabe 27 32 0 12 May 2020
CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech S. Karlapati Alexis Moinet Arnaud Joly V. Klimkov Daniel Sáez-Trigueros Thomas Drugman 19 67 0 30 Apr 2020
Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise Shan Yang Yuxuan Wang Lei Xie 21 9 0 28 Apr 2020
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders Yu Gu Xiang Yin Yonghui Rao Yuan Wan Benlai Tang Yang Zhang Jitong Chen Yuxuan Wang Zejun Ma 28 70 0 23 Apr 2020
Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit Tomoki Koriyama Hiroshi Saruwatari BDL 26 5 0 22 Apr 2020
Transformer based Grapheme-to-Phoneme Conversion Sevinj Yolchuyeva Géza Németh Bálint Gyires-Tóth 35 63 0 14 Apr 2020
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data Soumi Maiti Erik Marchi Alistair Conkie 19 17 0 10 Apr 2020
Advancing Speech Synthesis using EEG G. Krishna Co Tran Mason Carnahan Ahmed H. Tewfik 27 11 0 09 Apr 2020
Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System Gwenaelle Cunha Sergio Minho Lee 9 8 0 05 Apr 2020
Caption Generation of Robot Behaviors based on Unsupervised Learning of Action Segments Koichiro Yoshino Kohei Wakimoto Yuta Nishimura Satoshi Nakamura 14 8 0 23 Mar 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment Zhen Zeng Jianzong Wang Ning Cheng Tian Xia Jing Xiao VLM 33 56 0 04 Mar 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Jing Xiao 24 21 0 04 Mar 2020
Semi-Supervised Neural Architecture Search Renqian Luo Xu Tan Rui Wang Tao Qin Enhong Chen Tie-Yan Liu 13 88 0 24 Feb 2020