ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.10058
  4. Cited By
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

16 September 2024
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
    DiffM
ArXiv (abs)PDFHTML

Papers citing "StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion"

34 / 34 papers shown
Title
Style-Talker: Finetuning Audio Language Model and Style-Based
  Text-to-Speech Model for Fast Spoken Dialogue Generation
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Yinghao Aaron Li
Xilin Jiang
Jordan Darefsky
Ge Zhu
N. Mesgarani
77
4
0
13 Aug 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen M. Meng
Furu Wei
136
42
0
11 Jul 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
87
20
0
23 Apr 2024
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
113
37
0
21 Nov 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
74
4
0
18 Sep 2023
InstaFlow: One Step is Enough for High-Quality Diffusion-Based
  Text-to-Image Generation
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Xingchao Liu
Xiwen Zhang
Jianzhu Ma
Jian Peng
Qiang Liu
154
223
0
12 Sep 2023
High-Fidelity Audio Compression with Improved RVQGAN
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
98
334
0
11 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive
  Bias
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
93
80
0
06 Jun 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency
  Model
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
83
44
0
11 May 2023
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
70
46
0
17 Nov 2022
Classifier-Free Diffusion Guidance
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
196
3,963
0
26 Jul 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality
  Text-to-Speech
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
87
201
0
13 Jul 2022
Elucidating the Design Space of Diffusion-Based Generative Models
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras
M. Aittala
Timo Aila
S. Laine
DiffM
215
2,022
0
01 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
106
39
0
30 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
109
221
0
09 May 2022
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Takaaki Saeki
Detai Xin
Wataru Nakata
Tomoki Koriyama
Shinnosuke Takamichi
Hiroshi Saruwatari
109
217
0
05 Apr 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
485
15,734
0
20 Dec 2021
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
230
414
0
04 Dec 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
259
1,898
0
26 Oct 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
  Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
84
101
0
21 Jul 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
128
898
0
11 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
105
175
0
06 Jun 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
107
541
0
13 May 2021
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Edresson Casanova
C. Shulby
Eren Golge
Nicolas Müller
F. S. Oliveira
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
54
100
0
02 Apr 2021
MLS: A Large-Scale Multilingual Dataset for Speech Research
MLS: A Large-Scale Multilingual Dataset for Speech Research
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
97
509
0
07 Dec 2020
Score-Based Generative Modeling through Stochastic Differential
  Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffMSyDa
353
6,566
0
26 Nov 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
179
1,947
0
12 Oct 2020
Denoising Diffusion Implicit Models
Denoising Diffusion Implicit Models
Jiaming Song
Chenlin Meng
Stefano Ermon
VLMDiffM
289
7,469
0
06 Oct 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
105
1,406
0
08 Jun 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
229
3,155
0
16 May 2020
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Jacob Kahn
M. Rivière
Weiyi Zheng
Evgeny Kharitonov
Qiantong Xu
...
Tatiana Likhomanenko
Gabriel Synnaeve
Armand Joulin
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
75
673
0
17 Dec 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
104
959
0
05 Apr 2019
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
151
2,151
0
14 Nov 2017
Neural Discrete Representation Learning
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDLSSLOCL
230
5,071
0
02 Nov 2017
1