Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
v1
v2 (latest)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,154 papers shown
Title
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations
Neha Sahipjohn
Neil Shah
Vishal Tambrahalli
Vineet Gandhi
102
2
0
03 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
65
6
0
03 Jul 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Sheng Zhao
Qi-ping Yuan
Yibo Duan
Zhuo Chen
54
2
0
03 Jul 2023
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units
Junchen Lu
Berrak Sisman
Mingyang Zhang
Haizhou Li
85
4
0
29 Jun 2023
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Simian Luo
Chuanhao Yan
Chenxu Hu
Hang Zhao
DiffM
107
83
0
29 Jun 2023
Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data
J. Duret
Titouan Parcollet
Yannick Esteve
59
4
0
29 Jun 2023
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
Heeseung Kim
Sungwon Kim
Ji-Ran Yeom
Sung-Wan Yoon
DiffM
78
22
0
28 Jun 2023
Two-Stage Voice Anonymization for Enhanced Privacy
F. Nespoli
Daniel Barreda
Joerg Bitzer
Patrick A. Naylor
75
3
0
28 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
75
4
0
27 Jun 2023
CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation
Yuhao Cui
Xiongwei Wang
Zhongzhou Zhao
Wei Zhou
Haiqing Chen
66
1
0
27 Jun 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
Tomoki Toda
136
50
0
26 Jun 2023
An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
Lester Phillip Violeta
Tomoki Toda
61
2
0
24 Jun 2023
Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation
Zhonghua Liu
Shijun Wang
Ning Chen
DRL
80
2
0
21 Jun 2023
Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
P. Do
Matt Coler
J. Dijkstra
E. Klabbers
47
3
0
21 Jun 2023
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Ammar Abbas
S. Karlapati
Bastian Schnell
Penny Karanasou
M. G. Moya
Amith Nagaraj
Ayman Boustati
Nicole Peinelt
Alexis Moinet
Thomas Drugman
124
3
0
20 Jun 2023
Text-Driven Foley Sound Generation With Latent Diffusion Model
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Peipei Wu
Mark D.Plumbley
Wenwu Wang
DiffM
121
10
0
17 Jun 2023
FALL-E: A Foley Sound Synthesis Model and Strategies
Minsung Kang
Sangshin Oh
Hyeongi Moon
Kyungyun Lee
Ben Sangbae Chon
59
4
0
16 Jun 2023
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Shivam Mehta
Siyang Wang
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
DiffM
103
14
0
15 Jun 2023
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation
Zheng Liang
Zheshu Song
Ziyang Ma
Chenpeng Du
K. Yu
Xie Chen
76
5
0
14 Jun 2023
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction
Wenzhe Liu
Yupeng Shi
Jun Chen
Wei Rao
Shulin He
Andong Li
Yannan Wang
Zhiyong Wu
56
6
0
14 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
149
127
0
13 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
125
44
0
13 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
70
4
0
13 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
DiffM
73
9
0
12 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
144
338
0
11 Jun 2023
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
Dominik Wagner
Ilja Baumann
Tobias Bocklet
70
1
0
10 Jun 2023
Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
Shijun Wang
Jón Guðnason
Damian Borth
79
3
0
09 Jun 2023
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
Hao Liu
Tao Wang
Jie Cao
Ran He
J. Tao
DiffM
74
4
0
09 Jun 2023
VIFS: An End-to-End Variational Inference for Foley Sound Synthesis
Junhyeok Lee
Hyeonuk Nam
Yong-Hwa Park
56
4
0
08 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
96
6
0
07 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
105
80
0
06 Jun 2023
Looking and Listening: Audio Guided Text Recognition
Wenwen Yu
Mingyu Liu
Biao Yang
Enming Zhang
Deqiang Jiang
Xing Sun
Yuliang Liu
Xiang Bai
DiffM
82
1
0
06 Jun 2023
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings
Alexandra Antonova
Evelina Bakhturina
Boris Ginsburg
KELM
58
6
0
04 Jun 2023
In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis
N. Prabhu
N. Lehmann-Willenbrock
Timo Gerkmann
78
3
0
02 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
73
2
0
02 Jun 2023
Exploration on HuBERT with Multiple Resolutions
Jiatong Shi
Yun Tang
Hirofumi Inaguma
Hongyu Gong
J. Pino
Shinji Watanabe
132
9
0
01 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
157
104
0
01 Jun 2023
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
A. Iashchenko
Pavel Andreev
Ivan Shchekotov
Nicholas Babaev
Dmitry Vetrov
DiffM
101
2
0
01 Jun 2023
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Haobin Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
DiffM
106
27
0
01 Jun 2023
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech
P. Do
Matt Coler
J. Dijkstra
E. Klabbers
OffRL
63
0
0
01 Jun 2023
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
L. T. Nguyen
Thinh-Le-Gia Pham
Dat Quoc Nguyen
98
14
0
31 May 2023
Intelligible Lip-to-Speech Synthesis with Speech Units
J. Choi
Minsu Kim
Y. Ro
93
26
0
31 May 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Yerin Choi
M. Koo
77
0
0
31 May 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Guanghou Liu
Yongmao Zhang
Yinjiao Lei
Yunlin Chen
Rui Wang
Zhifei Li
Linfu Xie
95
42
0
31 May 2023
Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages
P. Do
Matt Coler
J. Dijkstra
E. Klabbers
52
4
0
30 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
88
27
0
30 May 2023
Towards single integrated spoofing-aware speaker verification embeddings
Sung Hwan Mun
Hye-jin Shim
Hemlata Tak
Xin Wang
Xuechen Liu
...
Junichi Yamagishi
Nicholas W. D. Evans
Tomi Kinnunen
N. Kim
Jee-weon Jung
152
12
0
30 May 2023
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification
Qing Wang
Jixun Yao
Ziqian Wang
Pengcheng Guo
Linfu Xie
AAML
68
1
0
30 May 2023
Voice Conversion With Just Nearest Neighbors
Matthew Baas
Benjamin van Niekerk
Herman Kamper
SSL
117
61
0
30 May 2023
Speaker anonymization using orthogonal Householder neural network
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
N. Tomashenko
BDL
74
21
0
30 May 2023
Previous
1
2
3
...
13
14
15
...
22
23
24
Next