Title
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis Erica Cooper Xin Wang Junichi Yamagishi 39 6 0 25 Apr 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data Yuzi Yan Xu Tan Bohan Li Tao Qin Sheng Zhao Yuan-Chung Shen Tie-Yan Liu 20 45 0 20 Apr 2021
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset Saida Mussakhojayeva Aigerim Janaliyeva A. Mirzakhmetov Yerbolat Khassanov H. A. Varol 17 14 0 17 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction Stanislav Beliaev Boris Ginsburg 27 8 0 16 Apr 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion Hirokazu Kameoka Kou Tanaka Takuhiro Kaneko 39 21 0 14 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis Yixuan Zhou Changhe Song Jingbei Li Zhiyong Wu Yanyao Bian Dan Su Helen Meng 46 6 0 14 Apr 2021
Non-autoregressive sequence-to-sequence voice conversion Tomoki Hayashi Wen-Chin Huang Kazuhiro Kobayashi Tomoki Toda 14 23 0 14 Apr 2021
Generalized Spoofing Detection Inspired from Audio Generation Artifacts Yang Gao Tyler Vuong Mahsa Elyasi Gaurav Bharaj Rita Singh 26 20 0 08 Apr 2021
Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features Mahsa Elyasi Gaurav Bharaj 19 2 0 08 Apr 2021
Phoneme-based Distribution Regularization for Speech Enhancement Yajing Liu Xiulian Peng Zhiwei Xiong Yan Lu 10 4 0 08 Apr 2021
Half-Truth: A Partially Fake Audio Detection Dataset Jiangyan Yi Ye Bai J. Tao Haoxin Ma Zhengkun Tian Chenglong Wang Tao Wang Ruibo Fu 21 82 0 08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis Xiang Li Changhe Song Jingbei Li Zhiyong Wu Jia Jia Helen Meng 25 47 0 08 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech Myeonghun Jeong Hyeongju Kim Sung Jun Cheon Byoung Jin Choi N. Kim DiffM 25 191 0 03 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability Rui Liu Berrak Sisman Haizhou Li 39 32 0 03 Apr 2021
Attention Forcing for Machine Translation Qingyun Dou Yiting Lu Potsawee Manakul Xixin Wu Mark Gales 33 7 0 02 Apr 2021
Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling Qing He Zhiping Xiu T. Koehler Jilong Wu 24 7 0 01 Apr 2021
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech M. Kang Jihyun Lee Simin Kim Injung Kim 8 6 0 01 Apr 2021
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training Kun Zhou Berrak Sisman Haizhou Li 28 27 0 31 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS Ye Jia Heiga Zen Jonathan Shen Yu Zhang Yonghui Wu SSL 52 81 0 28 Mar 2021
Continual Speaker Adaptation for Text-to-Speech Synthesis Hamed Hemati Damian Borth CLL 27 9 0 26 Mar 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee Kyumin Park Daeyoung Kim 24 30 0 17 Mar 2021
Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system Noé Tits Kevin El Haddad Thierry Dutoit 24 5 0 06 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice Mingjian Chen Xu Tan Bohan Li Yanqing Liu Tao Qin Sheng Zhao Tie-Yan Liu VLM DiffM 42 188 0 01 Mar 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward Momina Masood M. Nawaz K. Malik A. Javed Aun Irtaza AAML 128 299 0 25 Feb 2021
AudioVisual Speech Synthesis: A brief literature review Efthymios Georgiou Athanasios Katsamanis 21 0 0 18 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Dan Su 42 22 0 12 Feb 2021
Onoma-to-wave: Environmental sound synthesis from onomatopoeic words Yuki Okamoto Keisuke Imoto Shinnosuke Takamichi Ryosuke Yamanishi Takahiro Fukumori Y. Yamashita 13 14 0 11 Feb 2021
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search Renqian Luo Xu Tan Rui Wang Tao Qin Jinzhu Li Sheng Zhao Enhong Chen Tie-Yan Liu 22 58 0 08 Feb 2021
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network Chenpeng Du K. Yu 36 17 0 01 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet Shilu Lin Fenglong Xie Li Meng Xinhui Li Li Lu 11 0 0 30 Jan 2021
Expressive Neural Voice Cloning Paarth Neekhara Shehzeen Samarah Hussain Shlomo Dubnov F. Koushanfar Julian McAuley DiffM 35 30 0 30 Jan 2021
High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion M. S. Al-Radhi 16 1 0 25 Jan 2021
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss Eunwoo Song Ryuichi Yamamoto Min-Jae Hwang Jin-Seob Kim Ohsung Kwon Jae-Min Kim 19 14 0 19 Jan 2021
Whispered and Lombard Neural Speech Synthesis Qiong Hu T. Bleisch Petko N. Petkov T. Raitio Erik Marchi V. Lakshminarasimhan 12 14 0 13 Jan 2021
Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks B. Yousaf Muhammad Usama Waqas Sultani Arif Mahmood Junaid Qadir 25 8 0 03 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Wei-Ning Hsu David Harwath Christopher Song James R. Glass CLIP 37 66 0 31 Dec 2020
Unified Mandarin TTS Front-end Based on Distilled BERT Model Yang Zhang Liqun Deng Yasheng Wang 21 24 0 31 Dec 2020
Building Multi lingual TTS using Cross Lingual Voice Conversion Qinghua Sun Kenji Nagamatsu 6 3 0 28 Dec 2020
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 8 16 0 23 Dec 2020
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal Changhe Song Jingbei Li Yixuan Zhou Zhiyong Wu Helen Meng 30 6 0 13 Dec 2020
I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch Joseph P. Turian Max Henry 24 29 0 08 Dec 2020
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture Chenfeng Miao Shuang Liang Zhencheng Liu Minchuan Chen Jun Ma Shaojun Wang Jing Xiao 22 38 0 07 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution Zhen Zeng Jianzong Wang Ning Cheng Jing Xiao 22 8 0 03 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Lingwei Kong Jing Xiao 16 9 0 03 Dec 2020
FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge Bichen Wu Qing He Peizhao Zhang T. Koehler Kurt Keutzer Peter Vajda 31 6 0 25 Nov 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis Tao Li Shan Yang Liumeng Xue Lei Xie 28 73 0 17 Nov 2020
Accent and Speaker Disentanglement in Many-to-many Voice Conversion Zhichao Wang Wenshuo Ge Xiong Wang Shan Yang Wendong Gan Haitao Chen Hai Li Lei Xie Xiulin Li CVBM 41 32 0 17 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis Xi Wang Huaiping Ming Lei He Frank Soong 19 5 0 17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 27 55 0 17 Nov 2020