Title
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling Yuepeng Jiang Tao Li Fengyu Yang Lei Xie Meng Meng Yujun Wang 46 2 0 09 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation Min-Jae Hwang Ilia Kulikov Benjamin Peloquin Hongyu Gong Peng-Jen Chen Ann Lee 35 2 0 04 Jun 2024
Natural language guidance of high-fidelity text-to-speech with synthetic annotations Daniel Lyth Simon King 29 37 0 02 Feb 2024
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo Hyukhun Koh Kyomin Jung DiffM 52 0 0 23 Oct 2023
Prosody Analysis of Audiobooks Charuta Pethe Yunting Yin Felix D Childress Yunting Yin Steven Skiena 27 1 0 10 Oct 2023
Cross-Utterance Conditioned VAE for Speech Generation Yongqian Li Cheng Yu Guangzhi Sun Weiqin Zu Zheng Tian ... Wei Pan Chao Zhang Jun Wang Yang Yang Fanglei Sun 21 2 0 08 Sep 2023
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals Yiming Wu CoGe DRL 34 0 0 06 Sep 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Xixin Wu Shiyin Kang Helen Meng 35 7 0 29 Jul 2023
Controllable Speaking Styles Using a Large Language Model A. Sigurgeirsson Simon King 25 2 0 17 May 2023
Using Deepfake Technologies for Word Emphasis Detection Eran Kaufman Lee-Ad Gottlieb 35 0 0 12 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings Wei Xue Yiwen Wang Qi-fei Liu Yi-Ting Guo 39 1 0 09 May 2023
Scientists' Perspectives on the Potential for Generative AI in their Fields Meredith Ringel Morris AI4CE 33 38 0 04 Apr 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Rui Xue Yanqing Liu Lei He Xuejiao Tan Linquan Liu Ed Lin Sheng Zhao 34 7 0 06 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations Yuma Koizumi Heiga Zen Shigeki Karita Yifan Ding Kohei Yatabe Nobuyuki Morioka Yu Zhang Wei Han Ankur Bapna M. Bacchiani 39 22 0 03 Mar 2023
Long-horizon video prediction using a dynamic latent hierarchy Alexey Zakharov Qinghai Guo Z. Fountas 36 4 0 29 Dec 2022
Disentangled Representation Learning Xin Wang Hong Chen Siao Tang Zihao Wu Wenwu Zhu DRL 37 78 0 21 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder J. Melechovský Ambuj Mehrish Berrak Sisman Dorien Herremans 28 6 0 07 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis Karolos Nikitaras Konstantinos Klapsas Nikolaos Ellinas Georgia Maniati June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 19 0 0 01 Nov 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong Yun Wang Prabhav Agrawal Vimal Manohar Jilong Wu Thilo Kohler Qing He 23 0 0 28 Oct 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 42 6 0 22 Sep 2022
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech Saeed Ghorbani Ylva Ferstl Daniel Holden N. Troje M. Carbonneau 41 79 0 15 Sep 2022
The Role of Vocal Persona in Natural and Synthesized Speech Camille Noufi Lloyd May J. Berger 27 2 0 06 Sep 2022
Pathway to Future Symbiotic Creativity Yi-Ting Guo Qi-fei Liu Jie Chen Wei Xue Jie Fu ... Fernando Rosas Jeffrey Shaw Xing Wu Jiji Zhang Jianliang Xu 34 0 0 18 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai Tom Ko Yu Zhang 27 4 0 03 Aug 2022
Generative Extraction of Audio Classifiers for Speaker Identification Tejumade Afonja Lucas Bourtoule Varun Chandrasekaran Sageev Oore Nicolas Papernot AAML 15 1 0 26 Jul 2022
Controllable Data Generation by Deep Learning: A Review Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao 33 28 0 19 Jul 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue Kentaro Mitsui Tianyu Zhao Kei Sawada Yukiya Hono Yoshihiko Nankaku K. Tokuda 33 14 0 24 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 42 38 0 30 May 2022
SpeechPainter: Text-conditioned Speech Inpainting Zalan Borsos Matthew Sharifi Marco Tagliasacchi 16 26 0 15 Feb 2022
Unsupervised word-level prosody tagging for controllable speech synthesis Yiwei Guo Chenpeng Du Kai Yu 26 15 0 15 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 68 18 0 24 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 27 73 0 17 Jan 2022
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 33 23 0 25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 19 11 0 19 Nov 2021
Zero-shot Singing Technique Conversion Brendan O'Connor S. Dixon Georgy Fazekas 35 5 0 16 Nov 2021
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech Mu Li Jonas Rohnke Antonio Bonafonte Mateusz Lajszczak Trevor Wood DRL 30 2 0 24 Oct 2021
Variational Predictive Routing with Nested Subjective Timescales Alexey Zakharov Qinghai Guo Z. Fountas BDL AI4TS 43 9 0 21 Oct 2021
PixelPyramids: Exact Inference Models from Lossless Image Pyramids Shweta Mahajan Stefan Roth TPM 15 2 0 17 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 51 16 0 06 Oct 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World Emily Wenger Max Bronckers Christian Cianfarani Jenna Cryan Angela Sha Haitao Zheng Ben Y. Zhao AAML 40 39 0 20 Sep 2021
Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring Yaman Kumar Singla Avykat Gupta Shaurya Bagga Changyou Chen Balaji Krishnamurthy R. Shah 32 12 0 30 Aug 2021
Injecting Text in Self-Supervised Speech Pretraining Zhehuai Chen Yu Zhang Andrew Rosenberg Bhuvana Ramabhadran Gary Wang Pedro J. Moreno SSL 25 36 0 27 Aug 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 26 3 0 21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 42 9 0 18 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 25 16 0 15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 35 7 0 14 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 89 843 0 11 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning Rayhane Mama Marc S. Tyndel Hashiam Kadhim Cole Clifford Ragavan Thurairatnam VGen 29 12 0 08 Jun 2021
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization A. Lahiri Vivek Kwatra C. Frueh J. P. Lewis C. Bregler 3DH 38 99 0 08 Jun 2021