Title
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible Marcely Zanon Boito William N. Havard Mahault Garnerin Éric Le Ferrand Laurent Besacier 32 47 0 30 Jul 2019
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis Yuki Saito Shinnosuke Takamichi Hiroshi Saruwatari 8 10 0 19 Jul 2019
Forward-Backward Decoding for Regularizing End-to-End TTS Yibin Zheng Xi Wang Lei He Shifeng Pan Frank Soong Zhengqi Wen J. Tao 17 13 0 18 Jul 2019
Hierarchical Sequence to Sequence Voice Conversion with Limited Data P. Narayanan Punarjay Chakravarty F. Charette G. Puskorius 23 3 0 15 Jul 2019
Multi-Speaker End-to-End Speech Synthesis Jihyun Park Kexin Zhao Kainan Peng Ming-Yu Liu SyDa 14 19 0 09 Jul 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach Noé Tits 16 10 0 05 Jul 2019
Fine-grained robust prosody transfer for single-speaker neural text-to-speech V. Klimkov S. Ronanki Jonas Rohnke Thomas Drugman AI4TS 16 82 0 04 Jul 2019
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation Yi-Chiao Wu Tomoki Hayashi Patrick Lumban Tobing Kazuhiro Kobayashi T. Toda 21 16 0 01 Jul 2019
RUSLAN: Russian Spoken Language Corpus for Speech Synthesis Lenar Gabdrakhmanov Rustem Garaev E. Razinkov 23 9 0 26 Jun 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training Peng Wu Zhenhua Ling Li-Juan Liu Yuan Jiang Hong-Chuan Wu Lirong Dai 8 72 0 26 Jun 2019
Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations Jing-Xuan Zhang Zhenhua Ling Lirong Dai 22 99 0 25 Jun 2019
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling Yuanhao Yi Yang Ai Zhenhua Ling Lirong Dai 13 33 0 21 Jun 2019
A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation Hieu-Thi Luong Junichi Yamagishi 40 10 0 18 Jun 2019
Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models Wei Fang Yu-An Chung James R. Glass 15 27 0 17 Jun 2019
Parametric Resynthesis with neural vocoders Soumi Maiti Michael I. Mandel 14 19 0 16 Jun 2019
Telephonetic: Making Neural Language Models Robust to ASR and Semantic Noise Christopher Larson Tarek Lahlou Diana Mingels Zachary Kulis Erik T. Mueller 14 2 0 13 Jun 2019
Using generative modelling to produce varied intonation for speech synthesis Zack Hodari O. Watts Simon King 29 29 0 10 Jun 2019
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis Eric Battenberg Soroosh Mariooryad Daisy Stanton RJ Skerry-Ryan Matt Shannon David Kao Tom Bagby BDL 19 45 0 08 Jun 2019
Survey on Publicly Available Sinhala Natural Language Processing Tools and Research Nisansa de Silva 30 43 0 05 Jun 2019
KERMIT: Generative Insertion-Based Modeling for Sequences William Chan Nikita Kitaev Kelvin Guu Mitchell Stern Jakob Uszkoreit VLM 23 65 0 04 Jun 2019
MelNet: A Generative Model for Audio in the Frequency Domain Sean Vasquez M. Lewis DiffM 24 131 0 04 Jun 2019
Problem-Agnostic Speech Embeddings for Multi-Speaker Text-to-Speech with SampleRNN David Álvarez Santiago Pascual Antonio Bonafonte 16 12 0 03 Jun 2019
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS Mutian He Yan Deng Lei He 12 81 0 03 Jun 2019
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain Johanes Effendi Andros Tjandra S. Sakti Satoshi Nakamura 24 3 0 03 Jun 2019
SignalTrain: Profiling Audio Compressors with Deep Neural Networks Scott H. Hawley Benjamin Colburn S. I. Mimilakis 14 12 0 28 May 2019
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion Andy T. Liu Po-Chun Hsu Hung-yi Lee SSL 25 29 0 28 May 2019
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems Ohsung Kwon Eunwoo Song Jae-Min Kim Hong-Goo Kang 11 4 0 21 May 2019
Non-Autoregressive Neural Text-to-Speech Kainan Peng Ming-Yu Liu Z. Song Kexin Zhao 29 39 0 21 May 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network V. Wan Chun-an Chan Tom Kenter Jakub Vít R. Clark 21 75 0 17 May 2019
MoGlow: Probabilistic and controllable motion synthesis using normalising flows G. Henter Simon Alexanderson Jonas Beskow 39 97 0 16 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition Yi Ren Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 44 101 0 13 May 2019
Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion Orhan Ocal Oguz H. Elibol Gokce Keskin Cory Stephenson Anil Thomas Kannan Ramchandran 26 10 0 09 May 2019
Deep Learning for Audio Signal Processing Hendrik Purwins Bo-wen Li Tuomas Virtanen Jan Schlüter Shuo-yiin Chang Tara N. Sainath VLM 24 586 0 30 Apr 2019
End-to-End Spoken Language Translation Michelle Guo Albert Haque Prateek Verma 14 8 0 23 Apr 2019
Expediting TTS Synthesis with Adversarial Vocoding Paarth Neekhara Chris Donahue M. Puckette Shlomo Dubnov Julian McAuley 6 20 0 16 Apr 2019
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning Tao Tu Yuan-Jui Chen Cheng-chieh Yeh Hung-yi Lee 14 87 0 13 Apr 2019
RNN-based speech synthesis using a continuous sinusoidal model M. S. Al-Radhi T. Csapó Géza Németh 14 4 0 12 Apr 2019
Building a mixed-lingual neural TTS system with only monolingual data Liumeng Xue Wei Song Guanghui Xu Lei Xie Zhizheng Wu 17 30 0 12 Apr 2019
Direct speech-to-speech translation with a sequence-to-sequence model Ye Jia Ron J. Weiss Fadi Biadsy Wolfgang Macherey Melvin Johnson Z. Chen Yonghui Wu 21 223 0 12 Apr 2019
A New GAN-based End-to-End TTS Training Algorithm Haohan Guo Frank Soong Lei He Lei Xie 26 47 0 09 Apr 2019
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS Haohan Guo Frank Soong Lei He Lei Xie 16 30 0 09 Apr 2019
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation Fadi Biadsy Ron J. Weiss Pedro J. Moreno D. Kanvesky Ye Jia 21 112 0 08 Apr 2019
GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram Lauri Juvela Bajibabu Bollepalli Junichi Yamagishi P. Alku 11 18 0 08 Apr 2019
Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data Roee Levy Leshem Raja Giryes 8 8 0 06 Apr 2019
An Unsupervised Autoregressive Model for Speech Representation Learning Yu-An Chung Wei-Ning Hsu Hao Tang James R. Glass SSL 24 407 0 05 Apr 2019
Attention-Augmented End-to-End Multi-Task Learning for Emotion Prediction from Speech Zixing Zhang Bingwen Wu Bjoern Schuller 19 83 0 29 Mar 2019
Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet Mingyang Zhang Xin Wang Fuming Fang Haizhou Li Junichi Yamagishi 6 49 0 29 Mar 2019
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis Noé Tits Fengna Wang Kevin El Haddad Vincent Pagel Thierry Dutoit DiffM 15 39 0 27 Mar 2019
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages Kyubyong Park Thomas Mulc 14 100 0 27 Mar 2019
GANSynth: Adversarial Neural Audio Synthesis Jesse Engel Kumar Krishna Agrawal Shuo Chen Ishaan Gulrajani Chris Donahue Adam Roberts 49 385 0 23 Feb 2019