Title
Enhancing Speech-to-Speech Translation with Multiple TTS Targets Jiatong Shi Yun Tang Ann Lee Hirofumi Inaguma Changhan Wang J. Pino Shinji Watanabe 77 9 0 10 Apr 2023
ArmanTTS single-speaker Persian dataset Mohammd Hasan Shamgholi Vahid Saeedi J. Peymanfard Leila Alhabib Hossein Zeinali 48 2 0 07 Apr 2023
AraSpot: Arabic Spoken Command Spotting Mahmoud Salhab H. Harmanani 70 0 0 29 Mar 2023
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages Seong-Hyun Park Myungseo Song Bohyung Kim Tae-Hyun Oh 40 1 0 28 Mar 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Shogo Seki 57 9 0 24 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI Chenshuang Zhang Chaoning Zhang Sheng Zheng Mengchun Zhang Maryam Qamar Sung-Ho Bae In So Kweon DiffM MedIm 134 73 0 23 Mar 2023
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning Sung-Feng Huang Chia-Ping Chen Zhi-Sheng Chen Yu-Pao Tsai Hung-yi Lee 83 3 0 21 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need? Chaoning Zhang Chenshuang Zhang Sheng Zheng Yu Qiao Chenghao Li ... Lik-Hang Lee Yang Yang Heng Tao Shen In So Kweon Choong Seon Hong 193 170 0 21 Mar 2023
Transformers in Speech Processing: A Survey S. Latif Aun Zaidi Heriberto Cuayáhuitl Fahad Shamshad Moazzam Shoukat Muhammad Usama Junaid Qadir 176 48 0 21 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture Hauret Julien Joubaud Thomas V. Zimpfer Bavu Éric 61 7 0 17 Mar 2023
Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022 Taras Kucherenko Pieter Wolfert Youngwoo Yoon Carla Viegas Teodor Nikolov Mihail Tsakov G. Henter 78 24 0 15 Mar 2023
Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports Hyunseung Chung Jiho Kim Joon-Myoung Kwon K. Jeon Min Sung Lee Edward Choi MedIm 80 16 0 09 Mar 2023
Do Prosody Transfer Models Transfer Prosody? A. Sigurgeirsson Simon King DiffM 65 8 0 07 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Rui Xue Yanqing Liu Lei He Xuejiao Tan Linquan Liu Ed Lin Sheng Zhao 120 7 0 06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS Siyang Wang G. Henter Joakim Gustafson Éva Székely 71 5 0 05 Mar 2023
A General Framework for Learning Procedural Audio Models of Environmental Sounds Danzel Serrano M. Cartwright DiffM DRL 70 1 0 04 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations Yuma Koizumi Heiga Zen Shigeki Karita Yifan Ding Kohei Yatabe Nobuyuki Morioka Yu Zhang Wei Han Ankur Bapna M. Bacchiani 94 29 0 03 Mar 2023
Speaker-Aware Anti-Spoofing Xuechen Liu Md. Sahidullah Kong Aik Lee Tomi Kinnunen 86 3 0 02 Mar 2023
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities Shijun Wang Jón Guðnason Damian Borth 83 10 0 02 Mar 2023
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding Yingting Li Ambuj Mehrish Shuaijiang Zhao Rishabh Bhardwaj Amir Zadeh Navonil Majumder Rada Mihalcea Soujanya Poria AAML 66 18 0 02 Mar 2023
Leveraging Large Text Corpora for End-to-End Speech Summarization Kohei Matsuura Takanori Ashihara Takafumi Moriya Tomohiro Tanaka A. Ogawa Marc Delcroix Ryo Masumura 49 14 0 02 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations N. Shah Saiteja Kosgi Vishal Tambrahalli Neha Sahipjohn Anil Nelakanti Vineet Gandhi 76 8 0 01 Mar 2023
DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction R. Anantha Kriti Bhasin Daniela Aguilar Prabal Vashisht Becci Williamson Srinivas Chappidi 65 0 0 01 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus Ajinkya Kulkarni Atharva Kulkarni Sara Shatnawi Hanan Aldarmaki 37 9 0 28 Feb 2023
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis Ji-Hoon Kim Hongying Yang Yooncheol Ju Il-Hwan Kim Byeong-Yeol Kim 81 9 0 28 Feb 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech Kentaro Mitsui Yukiya Hono Kei Sawada CVBM 54 7 0 28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech Jiyoung Lee Joon Son Chung Soo-Whan Chung DiffM 101 31 0 27 Feb 2023
A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion Brendan O'Connor S. Dixon 48 0 0 27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow Yoonhyung Lee Jinhyeok Yang Kyomin Jung 67 6 0 27 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS Junhyeok Lee Wonbin Jung Hyunjae Cho Jaeyeon Kim Jaehwan Kim 88 3 0 24 Feb 2023
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition Leyuan Qu C. Weber S. Wermter 67 10 0 20 Feb 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion Houjian Guo Chaoran Liu C. Ishi H. Ishiguro BDL 100 13 0 16 Feb 2023
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages Sudhanshu Srivastava Ishika Gupta Anusha Prakash Jom Kuriakose H. Murthy VLM 72 1 0 13 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Li-Wei Chen Shinji Watanabe Alexander I. Rudnicky 84 37 0 08 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision Eugene Kharitonov Damien Vincent Zalan Borsos Raphaël Marinier Sertan Girgin Olivier Pietquin Matthew Sharifi Marco Tagliasacchi Neil Zeghidour 103 206 0 07 Feb 2023
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications Muhammad Arslan Manzoor S. Albarri Ziting Xian Zaiqiao Meng Preslav Nakov Shangsong Liang AI4TS 107 32 0 01 Feb 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining Takaaki Saeki Soumi Maiti Xinjian Li Shinji Watanabe Shinnosuke Takamichi Hiroshi Saruwatari 111 18 0 30 Jan 2023
On granularity of prosodic representations in expressive text-to-speech Mikolaj Babianski Kamil Pokora Raahil Shah Rafał Sienkiewicz Daniel Korzekwa V. Klimkov 66 6 0 26 Jan 2023
On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems Philippe Gonzalez T. S. Alstrøm Tobias May 80 9 0 25 Jan 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS Rohan Badlani Rafael Valle Kevin J. Shih J. F. Santos Siddharth Gururani Bryan Catanzaro 66 6 0 24 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study Massa Baali Tomoki Hayashi Hamdy Mubarak Soumi Maiti Shinji Watanabe W. El-Hajj Ahmed M. Ali 60 11 0 22 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation Xu Tan Tao Qin Jiang Bian Tie-Yan Liu Yoshua Bengio GAN 64 15 0 21 Jan 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions Yinghao Aaron Li Cong Han Xilin Jiang N. Mesgarani 66 24 0 20 Jan 2023
Msanii: High Fidelity Music Synthesis on a Shoestring Budget Kinyugo Maina 85 7 0 16 Jan 2023
Modelling low-resource accents without accent-specific TTS frontend Georgi Tinchev Marta Czarnowska Kamil Deja K. Yanagisawa Marius Cotescu 80 4 0 11 Jan 2023
Dual Learning for Large Vocabulary On-Device ASR Cal Peyser Ronny Huang Tara N. Sainath Rohit Prabhavalkar M. Picheny K. Cho SSL 63 1 0 11 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation Abdullah Shahid S. Latif Junaid Qadir 67 23 0 10 Jan 2023
Introducing Model Inversion Attacks on Automatic Speaker Recognition Karla Pizzi Franziska Boenisch U. Sahin Konstantin Böttinger 117 3 0 09 Jan 2023
SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain Heli Qi Sashi Novitasari Andros Tjandra S. Sakti Satoshi Nakamura 82 3 0 08 Jan 2023
Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation Miku Nishihara Yukiya Hono Kei Hashimoto Yoshihiko Nankaku K. Tokuda 118 1 0 05 Jan 2023