Title
Audio-Linguistic Embeddings for Spoken Sentences Albert Haque Michelle Guo Prateek Verma Li Fei-Fei 28 51 0 20 Feb 2019
Insertion Transformer: Flexible Sequence Generation via Insertion Operations Mitchell Stern William Chan J. Kiros Jakob Uszkoreit KELM 31 247 0 08 Feb 2019
Exploring Transfer Learning for Low Resource Emotional TTS Noé Tits Kevin El Haddad Thierry Dutoit 17 61 0 14 Jan 2019
Efficient Convolutional Neural Network Training with Direct Feedback Alignment Donghyeon Han H. Yoo 3DV 16 17 0 06 Jan 2019
Introduction to Voice Presentation Attack Detection and Recent Advances Md. Sahidullah Héctor Delgado Massimiliano Todisco Tomi Kinnunen Nicholas W. D. Evans Junichi Yamagishi Kong-Aik Lee AAML 13 75 0 04 Jan 2019
Feature reinforcement with word embedding and parsing information in neural TTS Huaiping Ming Lei He Haohan Guo Frank Soong 74 15 0 03 Jan 2019
Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice Yan Deng Lei He Frank Soong 63 29 0 13 Dec 2018
Learning pronunciation from a foreign language in speech synthesis networks Younggun Lee Suwon Shon Taesu Kim 22 26 0 23 Nov 2018
Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision Jing-Xuan Zhang Zhenhua Ling Yuan Jiang Li-Juan Liu Chen Liang Lirong Dai 17 29 0 20 Nov 2018
Effect of data reduction on sequence-to-sequence neural TTS Javier Latorre Jakub Lachowicz Jaime Lorenzo-Trueba Thomas Merritt Thomas Drugman S. Ronanki Klimkov Viacheslav 38 59 0 15 Nov 2018
Comprehensive evaluation of statistical speech waveform synthesis Thomas Merritt Bartosz Putrycz Adam Nadolski Tianjun Ye Daniel Korzekwa ... Alexis Moinet A. Breen Rafal Kuklinski N. Strom Roberto Barra-Chicote 19 17 0 15 Nov 2018
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms Kou Tanaka Hirokazu Kameoka Takuhiro Kaneko Nobukatsu Hojo 17 111 0 09 Nov 2018
Robust and fine-grained prosody control of end-to-end speech synthesis Younggun Lee Jonathan Le Roux 7 147 0 06 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation Ye Jia Melvin Johnson Wolfgang Macherey Ron J. Weiss Yuan Cao Chung-Cheng Chiu Naveen Ari Stella Laurenzo Yonghui Wu 31 159 0 05 Nov 2018
ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion Hirokazu Kameoka Kou Tanaka Damian Kwaśny Takuhiro Kaneko Nobukatsu Hojo 23 62 0 05 Nov 2018
Investigating context features hidden in End-to-End TTS Kohki Mametani T. Kato Seiichi Yamamoto 12 9 0 04 Nov 2018
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator Andros Tjandra S. Sakti Satoshi Nakamura 13 44 0 31 Oct 2018
WaveGlow: A Flow-based Generative Network for Speech Synthesis R. Prenger Rafael Valle Bryan Catanzaro 37 1,023 0 31 Oct 2018
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention Bajibabu Bollepalli Lauri Juvela P. Alku 17 4 0 29 Oct 2018
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language Yusuke Yasuda Xin Wang Shinji Takaki Junichi Yamagishi 22 86 0 29 Oct 2018
Reducing over-smoothness in speech synthesis using Generative Adversarial Networks Leyuan Sheng Evgeny Nikolaevich Pavlovskiy GAN 17 8 0 25 Oct 2018
SING: Symbol-to-Instrument Neural Generator Alexandre Défossez Neil Zeghidour Nicolas Usunier Léon Bottou Francis R. Bach 18 59 0 23 Oct 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis Wei-Ning Hsu Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu ... Ye Jia Z. Chen Jonathan Shen Patrick Nguyen Ruoming Pang BDL 12 274 0 16 Oct 2018
Sequence-to-Sequence Acoustic Modeling for Voice Conversion Jing-Xuan Zhang Zhenhua Ling Li-Juan Liu Yuan Jiang Lirong Dai 11 129 0 16 Oct 2018
Conditional WaveGAN Chae Young Lee Anoop Toffy G. Jung W. Han DiffM 21 21 0 27 Sep 2018
Sample Efficient Adaptive Text-to-Speech Yutian Chen Yannis Assael Brendan Shillingford David Budden Scott E. Reed ... Ben Laurie Çağlar Gülçehre Aaron van den Oord Oriol Vinyals Nando de Freitas 35 149 0 27 Sep 2018
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis Yu-An Chung Yuxuan Wang Wei-Ning Hsu Yu Zhang RJ Skerry-Ryan 22 117 0 30 Aug 2018
Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences Cheng-chieh Yeh Po-Chun Hsu Ju-Chieh Chou Hung-yi Lee Lin-Shan Lee 30 23 0 09 Aug 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis Daisy Stanton Yuxuan Wang RJ Skerry-Ryan 13 122 0 04 Aug 2018
Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model Andros Tjandra S. Sakti Satoshi Nakamura 8 12 0 22 Jul 2018
Noise Adaptive Speech Enhancement using Domain Adversarial Training Chien-Feng Liao Yu Tsao Hung-yi Lee H. Wang 17 51 0 19 Jul 2018
ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech Ming-Yu Liu Kainan Peng Jitong Chen 12 342 0 19 Jul 2018
Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis Jing-Xuan Zhang Zhenhua Ling Lirong Dai 13 83 0 18 Jul 2018
Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network Shinnosuke Takamichi Yuki Saito Norihiro Takamune Daichi Kitamura Hiroshi Saruwatari 13 42 0 10 Jul 2018
The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems Adaeze Adigwe Noé Tits Kevin El Haddad Sarah Ostadabbas Thierry Dutoit 6 79 0 25 Jun 2018
A Variational Prosody Model for Mapping the Context-Sensitive Variation of Functional Prosodic Prototypes B. Gerazov Gérard Bailly Omar Mohammed Yi Xu Philip N. Garner 11 7 0 22 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Z. Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 207 820 0 12 Jun 2018
Voice Imitating Text-to-Speech Neural Networks Younggun Lee Taesu Kim Soo-Young Lee 26 11 0 04 Jun 2018
Collapsed speech segment detection and suppression for WaveNet vocoder Yi-Chiao Wu Kazuhiro Kobayashi Tomoki Hayashi Patrick Lumban Tobing T. Toda 9 25 0 30 Apr 2018
Automatic Documentation of ICD Codes with Far-Field Speech Recognition Albert Haque Corinna Fukushima 11 0 0 30 Apr 2018
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations Ju-Chieh Chou Cheng-chieh Yeh Hung-yi Lee Lin-Shan Lee 6 132 0 09 Apr 2018
A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis Xin Wang Jaime Lorenzo-Trueba Shinji Takaki Lauri Juvela Junichi Yamagishi 20 67 0 07 Apr 2018
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder K. Akuzawa Yusuke Iwasawa Y. Matsuo 8 138 0 06 Apr 2018
Conditional End-to-End Audio Transforms Albert Haque Michelle Guo Prateek Verma 33 41 0 30 Mar 2018
Machine Speech Chain with One-shot Speaker Adaptation Andros Tjandra S. Sakti Satoshi Nakamura 25 55 0 28 Mar 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 16 547 0 24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 21 815 0 23 Mar 2018
Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data Jaime Lorenzo-Trueba Fuming Fang Xin Wang Isao Echizen Junichi Yamagishi Tomi Kinnunen 6 73 0 02 Mar 2018
Fitting New Speakers Based on a Short Untranscribed Sample Eliya Nachmani Adam Polyak Yaniv Taigman Lior Wolf 24 84 0 20 Feb 2018
Adversarial Audio Synthesis Chris Donahue Julian McAuley M. Puckette GAN 45 602 0 12 Feb 2018