Title
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset Hasam Khalid Shahroz Tariq Minha Kim Simon S. Woo 41 187 0 11 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang Qicong Xie Jihua Zhu Lei Xie O. Scharenborg 31 16 0 09 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features Gwantae Kim D. Han Hanseok Ko 50 42 0 06 Aug 2021
An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures Dengfeng Ke Yuxing Lu Xudong Liu Yanyan Xu Jing Sun Cheng-Hao Cai 32 0 0 06 Aug 2021
Applying the Information Bottleneck Principle to Prosodic Representation Learning Guangyan Zhang Ying Qin Daxin Tan Tan Lee 45 4 0 05 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis Julian Zaïdi Hugo Seuté Benjamin van Niekerk M. Carbonneau 34 20 0 04 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis Xudong Dai Cheng Gong Longbiao Wang Kaili Zhang 6 2 0 04 Aug 2021
Creation and Detection of German Voice Deepfakes Vanessa Barnekow Dominik Binder Niclas Kromrey Pascal Munaretto A. Schaad Felix Schmieder 21 2 0 02 Aug 2021
End to End Bangla Speech Synthesis Prithwiraj Bhattacharjee Rajan Saha Raju Arif Ahmad M. S. Rahman 11 2 0 01 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing Zhaofeng Shi 26 7 0 01 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 25 22 0 27 Jul 2021
Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations L. Benaroya Nicolas Obin Axel Roebel 16 5 0 26 Jul 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations Seyun Um Jihyun Kim Jihyun Lee Hong-Goo Kang CVBM 21 4 0 26 Jul 2021
Interactive Storytelling for Children: A Case-study of Design and Development Considerations for Ethical Conversational AI J. Chubb S. Missaoui S. Concannon Liam Maloney James Alfred Walker 21 29 0 20 Jul 2021
Human Perception of Audio Deepfakes Nicolas Müller Karla Markert Konstantin Böttinger 27 49 0 20 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio J. Weston R. Lenain U. Meepegama E. Fristed SSL 37 15 0 17 Jul 2021
Direct speech-to-speech translation with discrete units Ann Lee Peng-Jen Chen Changhan Wang Jiatao Gu Sravya Popuri ... Yossi Adi Qing He Yun Tang J. Pino Wei-Ning Hsu 41 181 0 12 Jul 2021
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis Hui Lu Zhiyong Wu Xixin Wu Xu Li Shiyin Kang Xunying Liu Helen Meng 33 12 0 07 Jul 2021
AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style Yuzi Yan Xu Tan Bohan Li Guangyan Zhang Tao Qin Sheng Zhao Yuan-Chung Shen Weiqiang Zhang Tie-Yan Liu 17 21 0 06 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures Prateek Verma C. Chafe 32 28 0 30 Jun 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech Ammar Abbas Bajibabu Bollepalli Alexis Moinet Arnaud Joly Penny Karanasou Peter Makarov Simon Slangens S. Karlapati Thomas Drugman 26 0 0 29 Jun 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 23 353 0 29 Jun 2021
N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement Gyeong-Hoon Lee Tae-Woo Kim Hanbin Bae Min-Ji Lee Young-Ik Kim Hoon-Young Cho VLM 22 19 0 29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis Jinhyeok Yang Jaesung Bae Taejun Bak Young-Ik Kim Hoon-Young Cho 34 36 0 29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis Taejun Bak Jaesung Bae Hanbin Bae Young-Ik Kim Hoon-Young Cho 34 16 0 29 Jun 2021
AI based Presentation Creator With Customized Audio Content Delivery Muvazima Mansoor Srikanth Chandar Ramamoorthy Srinath 26 0 0 27 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 26 3 0 21 Jun 2021
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis Jian Cong Shan Yang Lei Xie Dan Su DRL 18 29 0 21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Dan Su 25 30 0 21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 49 9 0 18 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Nanxin Chen Yu Zhang Heiga Zen Ron J. Weiss Mohammad Norouzi Najim Dehak William Chan DiffM 23 88 0 17 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model Chenye Cui Yi Ren Jinglin Liu Feiyang Chen Rongjie Huang Ming Lei Zhou Zhao 24 35 0 17 Jun 2021
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion Zhichao Wang Xinyong Zhou Fengyu Yang Tao Li Hongqiang Du Lei Xie Wendong Gan Haitao Chen Hai Li 32 22 0 16 Jun 2021
RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis Rohola Zandie Mohammad H. Mahoor Julia Madsen Eshrat S. Emamian 38 25 0 15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 40 7 0 14 Jun 2021
Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis M. S. Al-Radhi Tamás Gábor Csapó Csaba Zainkó Géza Németh 21 3 0 12 Jun 2021
HUI-Audio-Corpus-German: A high quality TTS dataset Pascal Puchtler Johannes Wirth René Peinl 14 21 0 11 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling Jingbei Li Yi Meng Chenyi Li Zhiyong Wu Helen Meng Chao Weng Dan Su 33 24 0 11 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache René Peinl 35 0 0 11 Jun 2021
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows Iván Vallés-Pérez Julian Roth Grzegorz Beringer Roberto Barra-Chicote J. Droppo 39 8 0 10 Jun 2021
Speech BERT Embedding For Improving Prosody in Neural TTS Liping Chen Yan Deng Xi Wang Frank Soong Lei He 25 22 0 08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dong Min Dong Bok Lee Eunho Yang Sung Ju Hwang 25 160 0 06 Jun 2021
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis Beáta Lőrincz Adriana Stan M. Giurgiu 29 2 0 03 Jun 2021
Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis Beáta Lőrincz Adriana Stan M. Giurgiu 29 6 0 03 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review Jabeen Summaira Xi Li Amin Muhammad Shoib Songyuan Li Abdul Jabbar HAI 28 55 0 24 May 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation Shoule Wu Ziqiang Shi DiffM 27 11 0 17 May 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework Jinyin Chen Linhui Ye Zhaoyan Ming 23 6 0 10 May 2021
Exploring emotional prototypes in a high dimensional TTS latent space Pol van Rijn Silvan Mertes Dominik Schiller Peter M. C. Harrison P. Larrouy-Maestri Elisabeth André Nori Jacoby 28 12 0 05 May 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks Rodrigo Mira Konstantinos Vougioukas Pingchuan Ma Stavros Petridis Björn W. Schuller Maja Pantic 41 43 0 27 Apr 2021
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis Kosuke Futamata Byeong-Cheol Park Ryuichi Yamamoto Kentaro Tachibana 22 14 0 26 Apr 2021