ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.03100
  4. Cited By
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

5 March 2024
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
Dongchao Yang
Yanqing Liu
Yichong Leng
Kaitao Song
Siliang Tang
Zhizheng Wu
Tao Qin
Xiang-Yang Li
Wei Ye
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
    DiffM
ArXivPDFHTML

Papers citing "NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models"

50 / 114 papers shown
Title
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
Zhengyan Sheng
Jinghao He
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
19
0
0
14 May 2025
Introducing voice timbre attribute detection
Introducing voice timbre attribute detection
Jinghao He
Zhengyan Sheng
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
22
1
0
14 May 2025
SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
Yicheng Gu
Chaoren Wang
J. Zhang
Xueyao Zhang
Zihao Fang
Haorui He
Zhizheng Wu
18
2
0
14 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
H. M. Zhang
...
Yichen Xiao
Yiying Zhou
Y. Zhang
Yuan Lu
Yucen He
26
0
0
12 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
E. Chng
31
0
0
12 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Y. Wang
Chaoren Wang
Z. Li
Zhuo Chen
Zhizheng Wu
126
0
0
07 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
A. Hengel
Yuankai Qi
Qingming Huang
129
0
0
02 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
39
0
0
01 May 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Y. Zhang
Wenxiang Guo
Changhao Pan
Z. Zhu
Tao Jin
Zhou Zhao
VGen
47
0
0
29 Apr 2025
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
Yue Li
W. Liu
Dongdong Lin
42
0
0
29 Apr 2025
Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech
Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech
P. O'Reilly
Zeyu Jin
Jiaqi Su
Bryan Pardo
26
0
0
15 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
S. Liu
J. Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
27
0
0
14 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
H. Meng
114
0
0
14 Apr 2025
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
54
0
0
11 Apr 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
H. Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
49
0
0
29 Mar 2025
Measuring the Robustness of Audio Deepfake Detectors
Measuring the Robustness of Audio Deepfake Detectors
Xiang Li
Pin-Yu Chen
Wenqi Wei
38
0
0
21 Mar 2025
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
Tao Feng
Zhiyuan Zhao
Yifan Xie
Yuqi Ye
Xiangyang Luo
Xun Guan
Y. Li
57
0
0
21 Mar 2025
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
Tianze Luo
Xingchen Miao
Wenbo Duan
DiffM
37
0
0
20 Mar 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
A. Hengel
Yuankai Qi
83
2
0
15 Mar 2025
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Weihao Wu
Zhiwei Lin
Yixuan Zhou
Jingbei Li
Rui Niu
Qinghua Wu
Songjun Cao
Long Ma
Zhiyong Wu
DiffM
39
0
0
27 Feb 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Y. Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
69
0
0
26 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
93
0
0
21 Feb 2025
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
Jian Jia
Jingtong Gao
Ben Xue
Junhao Wang
Qingpeng Cai
Quan Chen
Xiangyu Zhao
Peng Jiang
Kun Gai
OffRL
77
0
0
18 Feb 2025
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer
Zhengyan Sheng
Zhihao Du
Shiliang Zhang
Zhijie Yan
Yexin Yang
Zhenhua Ling
49
1
0
16 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
J. Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
48
0
0
16 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
55
2
0
05 Feb 2025
VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching
VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching
Ha-Yeong Choi
Jaehan Park
34
0
0
29 Jan 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Y. Wang
Kai Chen
Pengyuan Zhang
Z. Wu
AuLLM
56
4
0
28 Jan 2025
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae
Anastasia Kuznetsova
Dinesh Manocha
John Hershey
Trausti Kristjansson
Minje Kim
72
0
0
23 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
J. Zhang
Lu Lu
Y. Wang
Haizhou Li
Z. Wu
AuLLM
80
17
0
17 Jan 2025
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
Li Zhang
Jiyao Liu
Lei Xie
39
0
0
15 Jan 2025
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
Qianniu Chen
Xiaoyang Hao
B. Li
Y. Liu
Li Lu
39
0
0
15 Jan 2025
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Jinzuomu Zhong
Korin Richmond
Zhiba Su
Siqi Sun
53
4
0
10 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
84
3
0
03 Jan 2025
Autoregressive Speech Synthesis with Next-Distribution Prediction
Autoregressive Speech Synthesis with Next-Distribution Prediction
Xinfa Zhu
WenJie Tian
Lei Xie
VLM
167
4
0
22 Dec 2024
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Ze Yuan
Yanqing Liu
Shujie Liu
Sheng Zhao
AuLLM
74
1
0
06 Dec 2024
FreeCodec: A disentangled neural speech codec with fewer tokens
FreeCodec: A disentangled neural speech codec with fewer tokens
Youqiang Zheng
Weiping Tu
Yueteng Kang
Jie Chen
Yike Zhang
Li Xiao
Yuhong Yang
Long Ma
67
1
0
02 Dec 2024
Deepfake Media Generation and Detection in the Generative AI Era: A
  Survey and Outlook
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Florinel-Alin Croitoru
Andrei Iulian Hiji
Vlad Hondru
Nicolae-Cătălin Ristea
Paul Irofti
Marius Popescu
Cristian Rusu
Radu Tudor Ionescu
F. Khan
Mubarak Shah
89
2
0
29 Nov 2024
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot
  TTS and LLM
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
Jiawei Yu
Y. Li
Xiaosong Qiao
Huan Zhao
Xiaofeng Zhao
Wei Tang
M. Zhang
Hao Yang
Jinsong Su
78
0
0
20 Nov 2024
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zirui Zhang
Wei Hao
Aroon Sankoh
William Lin
Emanuel Mendiola-Ortiz
Junfeng Yang
Chengzhi Mao
AAML
28
3
0
31 Oct 2024
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and
  Low-frequency Character Bigrams
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams
Srija Anand
Praveen Srinivasa Varadhan
Mehak Singal
Mitesh M. Khapra
23
0
0
23 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
31
1
0
21 Oct 2024
Optimizing Neural Speech Codec for Low-Bitrate Compression via
  Multi-Scale Encoding
Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
Peiji Yang
Fengping Wang
Yicheng Zhong
Huawei Wei
Zhisheng Wang
23
0
0
21 Oct 2024
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
Md Mubtasim Ahasan
Md Fahim
Tasnim Mohiuddin
A K M Mahbubur Rahman
Aman Chadha
Tariq Iqbal
M. A. Amin
Md. Mofijul Islam
Amin Ahsan Ali
25
0
0
19 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction
  and Speculative Decoding
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
26
0
0
17 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
45
2
0
16 Oct 2024
Code Drift: Towards Idempotent Neural Audio Codecs
Code Drift: Towards Idempotent Neural Audio Codecs
P. O'Reilly
Prem Seetharaman
Jiaqi Su
Zeyu Jin
Bryan Pardo
116
0
0
14 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
125
2
0
09 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow
  Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
25
52
0
09 Oct 2024
CTC-GMM: CTC guided modality matching for fast and accurate streaming
  speech translation
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Rui Zhao
Jinyu Li
Ruchao Fan
Matt Post
38
1
0
07 Oct 2024
123
Next