ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.14398
  4. Cited By
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

22 December 2023
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
ArXivPDFHTML

Papers citing "ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations"

17 / 17 papers shown
Title
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
44
0
0
01 May 2025
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Haowei Lou
Hye-Young Paik
Sheng Li
Wen Hu
Lina Yao
48
0
0
11 Apr 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
54
0
0
31 Dec 2024
AfriHuBERT: A self-supervised speech representation model for African
  languages
AfriHuBERT: A self-supervised speech representation model for African languages
Jesujoba Oluwadara Alabi
Xuechen Liu
Dietrich Klakow
Junichi Yamagishi
VLM
38
1
0
30 Sep 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models
SSR: Alignment-Aware Modality Connector for Speech Language Models
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
32
3
0
30 Sep 2024
EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based
  Speech Synthesis
EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Haoyu Wang
Chunyu Qiang
Tianrui Wang
Cheng Gong
Qiuyu Liu
Tianwei Zhang
Xiaobao Wang
Chenyang Wang
Chen Zhang
40
1
0
27 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
56
3
0
23 Sep 2024
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic
  Framework and its Applicability in Automatic Pronunciation Assessment
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
Tien-Hong Lo
Meng-Ting Tsai
Berlin Chen
32
0
0
11 Sep 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks
  at Scale
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
45
39
0
16 Aug 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
37
2
0
13 Jun 2024
VISinger2+: End-to-End Singing Voice Synthesis Augmented by
  Self-Supervised Learning Representation
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Yifeng Yu
Jiatong Shi
Yuning Wu
Shinji Watanabe
38
3
0
13 Jun 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Wang
Xin Li
Luisa Verdoliva
Shu Hu
88
58
0
22 Jan 2024
High-Fidelity Speech Synthesis with Minimal Supervision: All Using
  Diffusion Models
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models
Chunyu Qiang
Hao Li
Yixin Tian
Yi Zhao
Ying Zhang
Longbiao Wang
Jianwu Dang
DiffM
41
2
0
27 Sep 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural
  TTS
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Haohan Guo
Fenglong Xie
Frank Soong
Xixin Wu
Helen M. Meng
42
11
0
22 Sep 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
185
379
0
04 Dec 2021
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Z. Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
207
820
0
12 Jun 2018
1