ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.09116
  4. Cited By
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot
  Speech and Singing Synthesizers

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

18 April 2023
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
    DiffM
ArXivPDFHTML

Papers citing "NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers"

32 / 32 papers shown
Title
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Y. Wang
Chaoren Wang
Zehan Li
Zhuo Chen
Zhizheng Wu
135
0
0
07 May 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
31
0
0
25 Apr 2025
Beyond Omakase: Designing Shared Control for Navigation Robots with Blind People
Beyond Omakase: Designing Shared Control for Navigation Robots with Blind People
Rie Kamikubo
Seita Kayukawa
Yuka Kaniwa
Allan Wang
Hernisa Kacorri
Hironobu Takagi
Chieko Asakawa
137
0
0
27 Mar 2025
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
Shuqi Dai
Yunyun Wang
Roger B. Dannenberg
Zeyu Jin
DiffM
56
0
0
23 Jan 2025
MathReader : Text-to-Speech for Mathematical Documents
MathReader : Text-to-Speech for Mathematical Documents
Sieun Hyeon
Kyudan Jung
N. Kim
Hyun Gon Ryu
Jaeyoung Do
36
1
0
13 Jan 2025
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Chenyu Yang
Shuai Wang
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Yongjun Xu
Yizhi Zhou
Haina Zhu
Hao Li
KELM
176
1
0
18 Dec 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
45
2
0
16 Oct 2024
Decoupling Layout from Glyph in Online Chinese Handwriting Generation
Decoupling Layout from Glyph in Online Chinese Handwriting Generation
Min-Si Ren
Yan-Ming Zhang
Yi Chen
31
0
0
03 Oct 2024
SongTrans: An unified song transcription and alignment method for lyrics
  and notes
SongTrans: An unified song transcription and alignment method for lyrics and notes
Siwei Wu
Jinzheng He
Ruibin Yuan
Haojie Wei
Xipin Wei
Chenghua Lin
Jin Xu
Junyang Lin
45
1
0
22 Sep 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style
  Temporal Modeling in Text-to-Speech
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Xin Qi
Ruibo Fu
Zhengqi Wen
Tao Wang
Chunyu Qiang
...
Xiaopeng Wang
Yuankun Xie
Yukun Liu
Xuefei Liu
Guanjun Li
DiffM
28
0
0
18 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
135
0
0
14 Sep 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace
Soham Ray
Kwangyoun Kim
Kilian Q. Weinberger
Felix Wu
36
2
0
01 Sep 2024
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
49
0
0
20 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
35
1
0
01 Aug 2024
TimeLDM: Latent Diffusion Model for Unconditional Time Series Generation
TimeLDM: Latent Diffusion Model for Unconditional Time Series Generation
Jian Qian
Miao Sun
Sifan Zhou
Biao Wan
Minhao Li
Patrick Chiang
39
7
0
05 Jul 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody
  Modeling
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
38
2
0
09 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
  Modeling of Audio Discrete Codes
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
38
3
0
05 Jun 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
46
4
0
30 Apr 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
49
41
0
30 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
24
25
0
25 Jan 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
  Zero-Shot Voice Conversion
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
22
6
0
19 Jan 2024
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
29
26
0
15 Dec 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
26
79
0
14 Aug 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Zhe Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
47
4
0
06 Jun 2023
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Rabeeh Karimi Mahabadi
Hamish Ivison
Jaesung Tae
James Henderson
Iz Beltagy
Matthew E. Peters
Arman Cohan
34
20
0
15 May 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency
  Model
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
30
40
0
11 May 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
31
7
0
06 Mar 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
48
641
0
05 Jan 2023
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial
  Vector-Quantized Auto-Encoders
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
Yanqing Liu
Rui Xue
Lei He
Xu Tan
Sheng Zhao
25
24
0
11 Jul 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
33
38
0
30 May 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
185
378
0
04 Dec 2021
1