Title
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications Biel Tura Vecino Adam Gabry's Daniel Mątwicki Andrzej Pomirski Tom Iddon Marius Cotescu Jaime Lorenzo-Trueba 42 3 0 12 May 2025
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Shivam Mehta Anna Deichler Jim O'Regan Birger Moëll Jonas Beskow G. Henter Simon Alexanderson 48 4 0 30 Apr 2024
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism Georgios Milis P. Filntisis A. Roussos Petros Maragos CVBM 38 2 0 11 Dec 2023
Prosody Analysis of Audiobooks Charuta Pethe Yunting Yin Felix D Childress Yunting Yin Steven Skiena 32 1 0 10 Oct 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model Zhe Ye Wei Xue Xuejiao Tan Jie Chen Qi-fei Liu Yi-Ting Guo DiffM 32 40 0 11 May 2023
Transformers in Speech Processing: A Survey S. Latif Aun Zaidi Heriberto Cuayáhuitl Fahad Shamshad Moazzam Shoukat Junaid Qadir 46 47 0 21 Mar 2023
Pathway to Future Symbiotic Creativity Yi-Ting Guo Qi-fei Liu Jie Chen Wei Xue Jie Fu ... Fernando Rosas Jeffrey Shaw Xing Wu Jiji Zhang Jianliang Xu 34 0 0 18 Aug 2022
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody Peter Makarov Ammar Abbas Mateusz Lajszczak Arnaud Joly S. Karlapati Alexis Moinet Thomas Drugman Penny Karanasou 23 16 0 29 Jun 2022
Avocodo: Generative Adversarial Network for Artifact-free Vocoder Taejun Bak Junmo Lee Hanbin Bae Jinhyeok Yang Jaesung Bae Young-Sun Joo 25 28 0 27 Jun 2022
Deep Performer: Score-to-Audio Music Performance Synthesis Hao-Wen Dong Cong Zhou Taylor Berg-Kirkpatrick Julian McAuley 27 17 0 12 Feb 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis Yu Wang Xinsheng Wang Pengcheng Zhu Jie Wu Hanzhao Li Heyang Xue Yongmao Zhang Lei Xie Mengxiao Bi 25 97 0 19 Jan 2022
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus Rongjie Huang Feiyang Chen Yi Ren Jinglin Liu Chenye Cui Zhou Zhao 36 100 0 20 Dec 2021
Transformer-S2A: Robust and Efficient Speech-to-Animation Liyang Chen Zhiyong Wu Jun Ling Runnan Li Xu Tan Sheng Zhao 35 18 0 18 Nov 2021
Integrated Speech and Gesture Synthesis Siyang Wang Simon Alexanderson Joakim Gustafson Jonas Beskow G. Henter Éva Székely 37 19 0 25 Aug 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion Daxin Tan Liqun Deng Y. Yeung Xin Jiang Xiao Chen Tan Lee 29 38 0 04 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech Ammar Abbas Bajibabu Bollepalli Alexis Moinet Arnaud Joly Penny Karanasou Peter Makarov Simon Slangens S. Karlapati Thomas Drugman 21 0 0 29 Jun 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition Zhengxi Liu Y. Qian DRL 19 10 0 25 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Nanxin Chen Yu Zhang Heiga Zen Ron J. Weiss Mohammad Norouzi Najim Dehak William Chan DiffM 23 88 0 17 Jun 2021
DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement Shubo Lv Yanxin Hu Shimin Zhang Lei Xie 24 93 0 16 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
Attention Forcing for Machine Translation Qingyun Dou Yiting Lu Potsawee Manakul Xixin Wu Mark Gales 31 7 0 02 Apr 2021
Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN Cong Wang Yu Chen Bin Wang Yi Shi 35 1 0 26 Mar 2021
Controllable Emotion Transfer For End-to-End Speech Synthesis Tao Li Shan Yang Liumeng Xue Lei Xie 28 73 0 17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 27 55 0 17 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 29 21 0 08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 24 98 0 06 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech S. Karlapati Ammar Abbas Zack Hodari Alexis Moinet Arnaud Joly Panagiota Karanasou Thomas Drugman 25 19 0 04 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS Isaac Elias Heiga Zen Jonathan Shen Yu Zhang Ye Jia Ron J. Weiss Yonghui Wu DRL 30 102 0 22 Oct 2020
FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction Qiao Tian Zewang Zhang Heng Lu Linghui Chen Shan Liu 19 22 0 12 May 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech Geng Yang Shan Yang Kai-Chun Liu Peng Fang Wei Chen Lei Xie 66 198 0 11 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint Zexin Cai Chuxiong Zhang Ming Li 24 41 0 10 May 2020