Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.03153
Cited By
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
6 June 2021
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation"
50 / 102 papers shown
Title
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
44
0
0
01 May 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
Anton Van Den Hengel
Yuankai Qi
91
2
0
15 Mar 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
98
0
0
21 Feb 2025
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting
Mohammad Jahid Ibna Basher
Md. Kowsher
Md Saiful Islam
R. N. Nandi
Nusrat Jahan Prottasha
...
Tareq Al Muntasir
Shammur A. Chowdhury
Firoj Alam
Niloofar Yousefi
O. Garibay
57
0
0
09 Feb 2025
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
Tian-Hao Zhang
Jiawei Zhang
Jun Wang
Xinyuan Qian
Xu-cheng Yin
CVBM
49
0
0
02 Jan 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
39
0
0
31 Dec 2024
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
40
0
0
31 Dec 2024
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
Xiangheng He
Junjie Chen
Zixing Zhang
Björn W. Schuller
83
0
0
16 Dec 2024
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
92
6
0
12 Dec 2024
CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Yuke Li
Xinfa Zhu
Hanzhao Li
J.-H. Yao
WenJie Tian
XiPeng Yang
Yunlin Chen
Zhifei Li
Lei Xie
DiffM
66
0
0
28 Nov 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
47
2
0
16 Oct 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
82
21
0
26 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
5
0
16 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
150
0
0
14 Sep 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
36
0
0
13 Sep 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
37
1
0
20 Aug 2024
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Xin Qi
Ruibo Fu
Zhengqi Wen
Jianhua Tao
Shuchen Shi
...
Yuankun Xie
Yukun Liu
Guanjun Li
Xuefei Liu
Yongwei Li
43
1
0
20 Aug 2024
Content and Style Aware Audio-Driven Facial Animation
Qingju Liu
Hyeongwoo Kim
Gaurav Bharaj
DiffM
43
1
0
13 Aug 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Hyun Joon Park
Jin Sob Kim
Wooseok Shin
Sung Won Han
DiffM
41
2
0
27 Jun 2024
A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons
Tzu-Yun Hung
Jui-Te Wu
Yu-Chia Kuo
Yo-Wei Hsiao
Ting-Wei Lin
Li Su
24
0
0
26 Jun 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
40
0
0
24 Jun 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang
Yang Song
Sanjay Jha
41
5
0
21 Jun 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
38
2
0
09 Jun 2024
Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Ahad Jawaid
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
MoE
40
0
0
05 Jun 2024
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Ruiqi Li
Rongjie Huang
Yongqi Wang
Zhiqing Hong
Zhou Zhao
40
1
0
04 Jun 2024
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Jiaben Chen
Xin Yan
Yihang Chen
Siyuan Cen
Qinwei Ma
Haoyu Zhen
Kaizhi Qian
Lie Lu
Chuang Gan
38
0
0
30 May 2024
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Haoxiang Shi
Jianzong Wang
Xulong Zhang
Ning Cheng
Jun Yu
Jing Xiao
36
2
0
27 May 2024
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang
Ji-Hoon Kim
Junseok Ahn
Doyeop Kwak
Hong-Sun Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
CVBM
31
9
0
16 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
37
83
0
09 May 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
42
10
0
28 Apr 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Yingting Li
Rishabh Bhardwaj
Ambuj Mehrish
Bo Cheng
Soujanya Poria
43
2
0
06 Apr 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
31
2
0
06 Mar 2024
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Gaoxiang Cong
Yuankai Qi
Liang-Sheng Li
Amin Beheshti
Zhedong Zhang
Anton Van Den Hengel
Ming-Hsuan Yang
Chenggang Yan
Qingming Huang
46
12
0
20 Feb 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
30
0
0
25 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
52
1
0
16 Jan 2024
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Yejin Jeon
Yunsu Kim
Gary Geunbae Lee
37
2
0
04 Jan 2024
Style Modeling for Multi-Speaker Articulation-to-Speech
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
26
8
0
21 Dec 2023
Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
Hui Fu
Zeqing Wang
Ke Gong
Keze Wang
Tianshui Chen
Haojie Li
Haifeng Zeng
Xiandong Li
43
10
0
18 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
28
8
0
17 Dec 2023
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
Yu Zhang
Rongjie Huang
Ruiqi Li
Jinzheng He
Yan Xia
Feiyang Chen
Xinyu Duan
Baoxing Huai
Zhou Zhao
VLM
26
17
0
17 Dec 2023
Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
Zhaoxi Mu
Xinyu Yang
Sining Sun
Qing Yang
SSL
23
8
0
16 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
32
31
0
21 Nov 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
21
24
0
08 Nov 2023
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
Xinfa Zhu
Yuke Li
Yinjiao Lei
Ning Jiang
Guoqing Zhao
Lei Xie
25
0
0
26 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
33
3
0
06 Oct 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
33
3
0
06 Oct 2023
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
Yu Gu
Yianrao Bian
Guangzhi Lei
Chao Weng
Dan Su
DiffM
15
2
0
22 Sep 2023
PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Jixun Yao
Yuguang Yang
Yinjiao Lei
Ziqian Ning
Yanni Hu
Y. Pan
Jingjing Yin
Hongbin Zhou
Heng Lu
Linfu Xie
DiffM
35
19
0
17 Sep 2023
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Hyun-Seo Shin
Ju-Sung Heo
Ju-ho Kim
Chanmann Lim
Wonbin Kim
Ha-Jin Yu
25
5
0
15 Sep 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon
Changhwan Kim
Eunwoo Song
Hyun-Wook Yoon
Hong-Goo Kang
37
1
0
28 Aug 2023
1
2
3
Next