Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis Zeeshan Ahmad Shudi Bao Meng Chen 20 0 0 14 May 2025
Consistent estimation of generative model representations in the data kernel perspective space Aranyak Acharyya M. Trosset Carey E. Priebe Hayden Helm DiffM 68 3 0 20 Jan 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation Ji-Hoon Kim Hong-Sun Yang Yoon-Cheol Ju Il-Hwan Kim Byeong-Yeol Kim Joon Son Chung BDL 54 0 0 31 Dec 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS Zhijun Liu Shuai Wang Pengcheng Zhu Mengxiao Bi Haizhou Li VLM DiffM 38 3 0 14 Sep 2024
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN Neeraj Kumar Ankur Narang Brejesh Lall DiffM 29 0 0 27 Oct 2023
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences Samuel Chun-Hei Lam Justin A. Sirignano K. Spiliopoulos 30 2 0 28 Aug 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding Chunyu Qiang Hao Li Hao Ni He Qu Ruibo Fu Tao Wang Longbiao Wang J. Dang DiffM 30 8 0 28 Jul 2023
Transformers in Speech Processing: A Survey S. Latif Aun Zaidi Heriberto Cuayáhuitl Fahad Shamshad Moazzam Shoukat Junaid Qadir 42 47 0 21 Mar 2023
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset Kailin Liang Bin Liu Yifan Hu Rui Liu F. Bao Guanglai Gao 28 1 0 11 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models Gaoxiang Cong Liang Li Yuankai Qi Zhengjun Zha Qi Wu Wen-yu Wang Bin Jiang Ming Yang Qin Huang 75 25 0 08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning Ankur Debnath Shridevi S Patil Gangotri Nadiger R. Ganesan 26 20 0 07 Dec 2022
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features T. U. K. Reddy Sahukari Chaitanya Varun Kota Pranav Kumar Sankala Sreekanth K. Murty 23 0 0 05 Dec 2022
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities Amin Azmoodeh Ali Dehghantanha 45 2 0 26 Nov 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong Yun Wang Prabhav Agrawal Vimal Manohar Jilong Wu Thilo Kohler Qing He 15 0 0 28 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos Björn W. Schuller Gokcce .Iymen M. Sezgin Xiangheng He ... Shuo Liu Silvan Mertes Elisabeth André Ruibo Fu Jianhua Tao 20 53 0 06 Oct 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Yifan Hu Pengkai Yin Rui Liu F. Bao Guanglai Gao 18 5 0 22 Sep 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 34 6 0 22 Sep 2022
Detecting Synthetic Speech Manipulation in Real Audio Recordings M. Rahman M. Graciarena Diego Castán Chris Cobo-Kroenke Mitchell McLaren A. Lawson AAML 25 9 0 15 Sep 2022
On the Horizon: Interactive and Compositional Deepfakes Eric Horvitz 16 27 0 05 Sep 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion Yinjiao Lei Shan Yang Jian Cong Linfu Xie Dan Su DiffM 52 12 0 05 Jul 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks Shanghua Gao Zhong-Yu Li Qi Han Ming-Ming Cheng Liang Wang 32 34 0 14 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality Xu Tan Jiawei Chen Haohe Liu Jian Cong Chen Zhang ... Lei He Frank Soong Tao Qin Sheng Zhao Tie-Yan Liu 44 213 0 09 May 2022
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training J. Yang Lei He 36 11 0 20 Jan 2022
How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey Zahra Khanjani Gabrielle Watson V. P Janeja 25 25 0 28 Nov 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 33 23 0 25 Nov 2021
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 10 17 0 07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection Joel Frank Lea Schonherr DiffM 129 123 0 04 Nov 2021
Neural Dubber: Dubbing for Videos According to Scripts Chenxu Hu Qiao Tian Tingle Li Yuping Wang Yuxuan Wang Hang Zhao DiffM VGen 36 39 0 15 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Pengfei Wu Junjie Pan Chenchang Xu Junhui Zhang Lin Wu Xiang Yin Zejun Ma 10 16 0 08 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks E. Hortal Rodrigo Brechard Alarcia GAN 26 2 0 06 Oct 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints Ji-Hoon Kim Sang-Hoon Lee Ji-Hyun Lee Hong G Jung Seong-Whan Lee 47 6 0 16 Aug 2021
Video Generation from Text Employing Latent Path Construction for Temporal Modeling Amir Mazaheri M. Shah 30 8 0 29 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
AI based Presentation Creator With Customized Audio Content Delivery Muvazima Mansoor Srikanth Chandar Ramamoorthy Srinath 26 0 0 27 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Dan Su 20 30 0 21 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dong Min Dong Bok Lee Eunho Yang Sung Ju Hwang 25 160 0 06 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review Jabeen Summaira Xi Li Amin Muhammad Shoib Songyuan Li Abdul Jabbar HAI 18 55 0 24 May 2021
RotLSTM: Rotating Memories in Recurrent Neural Networks Vlad Velici Adam Prugel-Bennett RALM VLM 17 1 0 01 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction Stanislav Beliaev Boris Ginsburg 21 8 0 16 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS Ye Jia Heiga Zen Jonathan Shen Yu Zhang Yonghui Wu SSL 45 81 0 28 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice Mingjian Chen Xu Tan Bohan Li Yanqing Liu Tao Qin Sheng Zhao Tie-Yan Liu VLM DiffM 37 187 0 01 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Dan Su 42 22 0 12 Feb 2021
Universal Neural Vocoding with Parallel WaveNet Yunlong Jiao Adam Gabry's Georgi Tinchev Bartosz Putrycz Daniel Korzekwa V. Klimkov 36 42 0 01 Feb 2021
Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things Jing Zhang Dacheng Tao 45 462 0 17 Nov 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis Zhifeng Kong Ming-Yu Liu Jiaji Huang Kexin Zhao Bryan Catanzaro DiffM BDL 34 1,392 0 21 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis Jiawei Chen Xu Tan Jian Luan Tao Qin Tie-Yan Liu VLM 19 92 0 03 Sep 2020
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder Hyun-Wook Yoon Sang-Hoon Lee Hyeong-Rae Noh Seong-Whan Lee 20 11 0 16 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning Berrak Sisman Junichi Yamagishi Simon King Haizhou Li BDL 41 317 0 09 Aug 2020
A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture Fady K. Fahmy M. Khalil Hazem M. Abbas 41 20 0 22 Jul 2020