Title
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement Wei Song Ya Yue Ya-Jie Zhang Zhengchen Zhang Youzheng Wu Xiaodong He 32 4 0 02 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers Cheng-Ping Hsieh Subhankar Ghosh Boris Ginsburg 43 18 0 01 Nov 2022
Generating Multilingual Gender-Ambiguous Text-to-Speech Voices K. Markopoulos Georgia Maniati G. Vamvoukakis Nikolaos Ellinas Georgios Vardaxoglou ... Gunu Jho Inchul Hwang Aimilios Chalamandaris Pirros Tsiakoulis S. Raptis 44 1 0 01 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Georgia Maniati Panos Kakoulidis June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 29 2 0 31 Oct 2022
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection L. Attorresi Davide Salvi Clara Borrelli Paolo Bestagini Stefano Tubaro 21 22 0 31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis Anusha Prakash H. Murthy 34 0 0 31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong Yun Wang Prabhav Agrawal Vimal Manohar Jilong Wu Thilo Kohler Qing He 23 0 0 28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis Yuma Shirahata Ryuichi Yamamoto Eunwoo Song Ryo Terashima Jae-Min Kim Kentaro Tachibana 31 10 0 28 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis Yifan Hu Rui Liu Guanglai Gao Haizhou Li 152 7 0 27 Oct 2022
RedPen: Region- and Reason-Annotated Dataset of Unnatural Speech Kyumin Park Keon Lee Daeyoung Kim Dongyeop Kang 26 0 0 26 Oct 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao AI4TS 28 5 0 25 Oct 2022
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion Kun Zhou Berrak Sisman Carlos Busso Bin Ma Haizhou Li 37 3 0 25 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS Ziqi Liang 36 0 0 24 Oct 2022
Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS Chunyu Qiang J. Tao Ruibo Fu Zhengqi Wen Jiangyan Yi Tao Wang Shiming Wang 11 0 0 20 Oct 2022
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models Aya Watanabe Shinnosuke Takamichi Yuki Saito Detai Xin Hiroshi Saruwatari 45 3 0 18 Oct 2022
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images Hien Ohnaka Shinnosuke Takamichi Keisuke Imoto Yuki Okamoto Kazuki Fujii Hiroshi Saruwatari DiffM 24 8 0 17 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario Emily R. Bartusiak Edward J. Delp 27 12 0 14 Oct 2022
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion Yuxiang Zhang Jingze Lu Xingming Wang Zhuo Li Runqiu Xiao Wenchao Wang Ming Li Pengyuan Zhang 46 5 0 13 Oct 2022
SQuId: Measuring Speech Naturalness in Many Languages Thibault Sellam Ankur Bapna Joshua Camp Diana Mackinnon Ankur P. Parikh Jason Riesa 35 17 0 12 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection Piotr Kawa Marcin Plata P. Syga 37 14 0 12 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi Myeonghun Jeong Minchan Kim Sung Hwan Mun N. Kim DiffM 27 5 0 12 Oct 2022
Style-Guided Inference of Transformer for High-resolution Image Synthesis Jonghwa Yim Minjae Kim ViT 37 0 0 11 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos Björn W. Schuller Gokcce .Iymen M. Sezgin Xiangheng He ... Shuo Liu Silvan Mertes Elisabeth André Ruibo Fu Jianhua Tao 20 53 0 06 Oct 2022
The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection Daniele Mari Federica Latora Simone Milani 15 11 0 06 Oct 2022
A Deep Investigation of RNN and Self-attention for the Cyrillic-Traditional Mongolian Bidirectional Conversion Muhan Na Rui Liu Feilong Guanglai Gao 35 0 0 24 Sep 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Yifan Hu Pengkai Yin Rui Liu F. Bao Guanglai Gao 18 5 0 22 Sep 2022
AutoLV: Automatic Lecture Video Generator Wen Wang Yang Song Sanjay Jha VGen 29 3 0 19 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS Liumeng Xue Frank Soong Shaofei Zhang Linfu Xie 27 23 0 14 Sep 2022
Deep Speech Synthesis from Articulatory Representations Peter Wu Shinji Watanabe Louis Goldstein A. Black Gopala K. Anumanchipalli 39 25 0 13 Sep 2022
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation Peining Zhang Junliang Guo Linli Xu Mu You Junming Yin 24 0 0 05 Sep 2022
Evaluating generative audio systems and their metrics Ashvala Vinay Alexander Lerch 35 19 0 31 Aug 2022
Visualising Model Training via Vowel Space for Text-To-Speech Systems Binu Abeysinghe Jesin James C. Watson Felix Marattukalam 32 2 0 21 Aug 2022
Fully Automated End-to-End Fake Audio Detection Chenglong Wang Jiangyan Yi J. Tao Haiyang Sun Xun Chen Zhengkun Tian Haoxin Ma Cunhang Fan Ruibo Fu 26 28 0 20 Aug 2022
Pathway to Future Symbiotic Creativity Yi-Ting Guo Qi-fei Liu Jie Chen Wei Xue Jie Fu ... Fernando Rosas Jeffrey Shaw Xing Wu Jiji Zhang Jianliang Xu 34 0 0 18 Aug 2022
Enhancing Audio Perception of Music By AI Picked Room Acoustics Prateek Verma J. Berger 21 0 0 16 Aug 2022
Speech Synthesis with Mixed Emotions Kun Zhou Berrak Sisman R. Rana B.W.Schuller Haizhou Li 27 44 0 11 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset Xiang Li Changhe Song X. Wei Zhiyong Wu Jia Jia Helen Meng 29 4 0 10 Aug 2022
AdaCat: Adaptive Categorical Discretization for Autoregressive Models Qiyang Li Ajay Jain Pieter Abbeel OffRL 45 4 0 03 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai Tom Ko Yu Zhang 27 4 0 03 Aug 2022
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network Da-Rong Liu Po-Chun Hsu Yi-Chen Chen Sung-Feng Huang Shun-Po Chuang Da-Yi Wu Hung-yi Lee GAN 31 7 0 29 Jul 2022
SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation Artem Ploujnikov Mirco Ravanelli 9 18 0 27 Jul 2022
A Proposal for Foley Sound Synthesis Challenge Keunwoo Choi Sangshin Oh Minsung Kang Brian McFee 26 11 0 21 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation Dongchao Yang Jianwei Yu Helin Wang Wen Wang Chao Weng Yuexian Zou Dong Yu DiffM 36 297 0 20 Jul 2022
End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting Thierry Desot François Portet Michel Vacher 27 12 0 17 Jul 2022
Data Augmentation for Low-Resource Quechua ASR Improvement Rodolfo Zevallos Núria Bel Guillermo Cámbara Mireia Farrús Jordi Luque VLM SyDa 19 6 0 14 Jul 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Rongjie Huang Zhou Zhao Huadai Liu Jinglin Liu Chenye Cui Yi Ren DiffM 44 195 0 13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 36 10 0 13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate Nabarun Goswami Tatsuya Harada 26 5 0 13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection Haoxin Ma Jiangyan Yi Chenglong Wang Xin Yan J. Tao Tao Wang Shiming Wang Ruibo Fu 24 26 0 12 Jul 2022
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion Wen-Chin Huang Shu-Wen Yang Tomoki Hayashi T. Toda 21 15 0 10 Jul 2022