Title
A General Framework for Learning Procedural Audio Models of Environmental Sounds Danzel Serrano M. Cartwright DiffM DRL 35 1 0 04 Mar 2023
LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion Chunfeng Wang Peisong Huang Yuxiang Zou Haoyu Zhang Shichao Liu Xiang Yin Zejun Ma 16 2 0 02 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations N. Shah Saiteja Kosgi Vishal Tambrahalli Neha Sahipjohn Anil Nelakanti Vineet Gandhi 25 8 0 01 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus Ajinkya Kulkarni Atharva Kulkarni Sara Shatnawi Hanan Aldarmaki 19 8 0 28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech Jiyoung Lee Joon Son Chung Soo-Whan Chung DiffM 38 27 0 27 Feb 2023
A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion Brendan O'Connor S. Dixon 24 0 0 27 Feb 2023
Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing Alexandra Chronopoulou Brian Thompson Prashant Mathur Yogesh Virkar Surafel Melaku Lakew Marcello Federico 26 7 0 25 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS Junhyeok Lee Wonbin Jung Hyunjae Cho Jaeyeon Kim Jaehwan Kim 22 3 0 24 Feb 2023
Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants Domna Bilika Nikoletta Michopoulou E. Alepis Constantinos Patsakis 33 8 0 20 Feb 2023
MTTM: Metamorphic Testing for Textual Content Moderation Software Wenxuan Wang Jen-tse Huang Weibin Wu Jianping Zhang Yizhan Huang Shuqing Li Pinjia He Michael Lyu 58 30 0 11 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt Dongchao Yang Songxiang Liu Rongjie Huang Chao Weng Helen Meng DiffM VLM 31 85 0 31 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study Massa Baali Tomoki Hayashi Hamdy Mubarak Soumi Maiti Shinji Watanabe W. El-Hajj Ahmed M. Ali 30 10 0 22 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation Xu Tan Tao Qin Jiang Bian Tie-Yan Liu Yoshua Bengio GAN 38 15 0 21 Jan 2023
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation Simbarashe Nyatsanga Taras Kucherenko Chaitanya Ahuja G. Henter Michael Neff SLR 44 90 0 13 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion Hao Liu Tao Wang Ruibo Fu Jiangyan Yi Zhengqi Wen J. Tao 23 3 0 10 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation Abdullah Shahid S. Latif Junaid Qadir 31 23 0 10 Jan 2023
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism Yukiya Hono Kei Hashimoto Yoshihiko Nankaku K. Tokuda 16 2 0 28 Dec 2022
Development and Evaluation of a Learning-based Model for Real-time Haptic Texture Rendering Negin Heravi Heather Culbertson Allison M. Okamura Jeannette Bohg DiffM 9 8 0 27 Dec 2022
Source Tracing: Detecting Voice Spoofing Tinglong Zhu Xingming Wang Xiaoyi Qin Ming Li 29 12 0 16 Dec 2022
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language Yusuke Yasuda T. Toda 33 8 0 16 Dec 2022
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator Amrutha Prasad Juan Pablo Zuluaga P. Motlícek Seyyed Saeed Sarfjoo Iuliia Nigmatulina Karel Veselý 36 3 0 14 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis Chunyu Qiang Peng Yang Hao Che Xiaorui Wang Zhongyuan Wang BDL 34 6 0 13 Dec 2022
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features Junhui Zhang Junjie Pan Xiang Yin Zejun Ma 27 0 0 12 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset Kailin Liang Bin Liu Yifan Hu Rui Liu F. Bao Guanglai Gao 28 1 0 11 Dec 2022
GreenEyes: An Air Quality Evaluating Model based on WaveNet Kan Huang Kai Zhang Ming-de Liu 17 2 0 08 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models Gaoxiang Cong Liang Li Yuankai Qi Zhengjun Zha Qi Wu Wen-yu Wang Bin Jiang Ming Yang Qin Huang 75 25 0 08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning Ankur Debnath Shridevi S Patil Gangotri Nadiger R. Ganesan 29 20 0 07 Dec 2022
Learning the joint distribution of two sequences using little or no paired data Soroosh Mariooryad Matt Shannon Siyuan Ma Tom Bagby David Kao Daisy Stanton Eric Battenberg RJ Skerry-Ryan 30 2 0 06 Dec 2022
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features T. U. K. Reddy Sahukari Chaitanya Varun Kota Pranav Kumar Sankala Sreekanth K. Murty 23 0 0 05 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis Yinjiao Lei Shan Yang Xinsheng Wang Qicong Xie Jixun Yao Linfu Xie Dan Su DiffM 21 8 0 03 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech Byoung Jin Choi Myeonghun Jeong Joun Yeop Lee N. Kim 23 13 0 30 Nov 2022
Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses Yang Ai Zhenhua Ling 21 24 0 29 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities Amin Azmoodeh Ali Dehghantanha 45 2 0 26 Nov 2022
Efficient Incremental Text-to-Speech on GPUs Muyang Du Chuan Liu Jiaxing Qi Junjie Lai 24 1 0 25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems? Xuan Shi Erica Cooper Xin Wang Junichi Yamagishi Shrikanth Narayanan 27 1 0 25 Nov 2022
3d human motion generation from the text via gesture action classification and the autoregressive model Gwantae Kim Youngsuk Ryu Junyeop Lee D. Han Jeongmin Bae Hanseok Ko 17 2 0 18 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 43 18 0 17 Nov 2022
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation Chunyu Qiang Peng Yang Hao Che Jinba Xiao Xiaorui Wang Zhongyuan Wang 20 3 0 17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 20 46 0 17 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints Zhichao Wang Xinsheng Wang Linfu Xie Yuan-Jui Chen Qiao Tian Yuping Wang 30 5 0 16 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 39 12 0 13 Nov 2022
Online Phase Reconstruction via DNN-based Phase Differences Estimation Yoshiki Masuyama Kohei Yatabe Kento Nagatomo Yasuhiro Oikawa 3DV 16 7 0 12 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Xiaoran Fan Chao Pang Tian Yuan Richard He Bai Renjie Zheng ... Junkun Chen Zeyu Chen Liang Huang Yu Sun Hua Wu 40 0 0 07 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder J. Melechovský Ambuj Mehrish Berrak Sisman Dorien Herremans 28 6 0 07 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space Jihwan Lee Jaesung Bae Seongkyu Mun Heejin Choi Joun Yeop Lee Hoon-Young Cho Chanwoo Kim 32 2 0 06 Nov 2022
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling Jixun Yao Qing Wang Yi Lei Pengcheng Guo Linfu Xie Namin Wang Jie Liu 38 14 0 06 Nov 2022
Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech Xin Zhang Iván Vallés-Pérez A. Stolcke Chengzhu Yu J. Droppo Olabanji Shonibare Roberto Barra-Chicote Venkatesh Ravichandran 33 6 0 04 Nov 2022
CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment Yuchen Liu Li-Chia Yang Alex Pawlicki Marko Stamenovic 27 6 0 04 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis Konstantinos Klapsas Karolos Nikitaras Nikolaos Ellinas June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 26 0 0 02 Nov 2022
Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation Yingjie Song Wei Song Wei Zhang Zhengchen Zhang Dan Zeng Zhi Liu Yang Yu DiffM 19 5 0 02 Nov 2022