M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

3 May 2023

Papers citing "M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis"

7 / 7 papers shown

Title
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching Leying Zhang Y. Qian Xiaofei Wang Manthan Thakker Dongmei Wang ... Haibin Wu Yuxuan Hu Jinyu Li Yanmin Qian Sheng Zhao 35 0 0 01 Jun 2025
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining Jinlong Xue Yayue Deng Yingming Gao Ya Li RALM VLM 136 7 0 06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model Jinlong Xue Yayue Deng Yicheng Han Yingming Gao Ya Li 95 4 0 06 Jun 2024
Pheme: Efficient and Conversational Speech Generation Paweł Budzianowski Taras Sereda Tomasz Cichy Ivan Vulić 78 7 0 05 Jan 2024
CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis Yayue Deng Jinlong Xue Yukang Jia Qifei Li Yichen Han Fengping Wang Yingming Gao Dengfeng Ke Ya Li 89 7 0 16 Dec 2023
Towards human-like spoken dialogue generation between AI agents from written dialogue Kentaro Mitsui Yukiya Hono Kei Sawada 88 14 0 02 Oct 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Yinghao Aaron Li Cong Han Vinay S. Raghavan Gavin Mischler N. Mesgarani VLM DiffM 145 126 0 13 Jun 2023