Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
v1
v2 (latest)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,154 papers shown
Title
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens
B. Halpern
Wen-Chin Huang
Lester Phillip Violeta
R.J.J.H. van Son
Tomoki Toda
127
2
0
04 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
Roi Benita
Michael Elad
Joseph Keshet
DiffM
130
8
0
02 Oct 2023
Mirror Diffusion Models for Constrained and Watermarked Generation
Guan-Horng Liu
T. Chen
Evangelos A. Theodorou
Molei Tao
DiffM
81
23
0
02 Oct 2023
Towards human-like spoken dialogue generation between AI agents from written dialogue
Kentaro Mitsui
Yukiya Hono
Kei Sawada
98
14
0
02 Oct 2023
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
Alexandra Antonova
84
0
0
29 Sep 2023
ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
Wenhao Guan
Qi Su
Haodong Zhou
Shiyu Miao
Xingjia Xie
Lin Li
Q. Hong
DiffM
70
18
0
29 Sep 2023
Collaborative Watermarking for Adversarial Speech Synthesis
Lauri Juvela
Xin Wang
106
14
0
26 Sep 2023
BiSinger: Bilingual Singing Voice Synthesis
Huali Zhou
Yueqian Lin
Yao Shi
Peng Sun
Ming Li
66
5
0
25 Sep 2023
VoiceLDM: Text-to-Speech with Environmental Context
Yeong-Won Lee
In-won Yeon
Juhan Nam
Joon Son Chung
VLM
DiffM
75
15
0
24 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
Xintong Wang
Chang Zeng
Jun Chen
Chunhui Wang
78
6
0
22 Sep 2023
The Impact of Silence on Speech Anti-Spoofing
Yuxiang Zhang
Zhuo Li
Jingze Lu
Hua Hua
Wenchao Wang
Pengyuan Zhang
90
21
0
21 Sep 2023
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency
Rui Liu
Jiatian Xi
Ziyue Jiang
Haizhou Li
135
4
0
21 Sep 2023
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Yatong Bai
Trung D. Q. Dang
Dung N. Tran
K. Koishida
Somayeh Sojoudi
DiffM
178
23
0
19 Sep 2023
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
Lester Phillip Violeta
Wen-Chin Huang
D. Ma
Ryuichi Yamamoto
Kazuhiro Kobayashi
Tomoki Toda
70
5
0
18 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
103
4
0
18 Sep 2023
Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers
Peter Ochieng
DiffM
61
0
0
18 Sep 2023
Augmenting text for spoken language understanding with Large Language Models
Roshan Sharma
Suyoun Kim
Daniel Lazar
Trang Le
Akshat Shrivastava
Kwanghoon Ahn
Piyush Kansal
Leda Sari
Ozlem Kalinli
Michael Seltzer
99
2
0
17 Sep 2023
Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition
Haoming Guo
Seth Z. Zhao
Jiachen Lian
Gopala Anumanchipalli
Gerald Friedland
71
3
0
16 Sep 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
Huatian Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
90
3
0
16 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
85
6
0
15 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
74
1
0
15 Sep 2023
Fewer-token Neural Speech Codec with Time-invariant Codes
Yong Ren
Tao Wang
Jiangyan Yi
Le Xu
Jianhua Tao
Chuyuan Zhang
Jun Zhou
106
36
0
15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
84
4
0
15 Sep 2023
VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research
Sarina Meyer
Xiaoxiao Miao
Ngoc Thang Vu
131
6
0
14 Sep 2023
EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
N. Prabhu
Bunlong Lay
Simon Welker
N. Lehmann-Willenbrock
Timo Gerkmann
DiffM
84
3
0
14 Sep 2023
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Sipan Li
Songxiang Liu
Lu Zhang
Xiang Li
Yanyao Bian
Chao Weng
Zhiyong Wu
Helen Meng
62
2
0
14 Sep 2023
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings
Arnab Das
Suhita Ghosh
Tim Polzehl
Sebastian Stober
72
4
0
14 Sep 2023
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Suhita Ghosh
Arnab Das
Yamini Sinha
Ingo Siegert
Tim Polzehl
Sebastian Stober
64
4
0
14 Sep 2023
Direct Text to Speech Translation System using Acoustic Units
Victoria Mingote
Pablo Gimeno
Luis Vicente
Sameer Khurana
Antoine Laurent
J. Duret
62
4
0
14 Sep 2023
SpatialCodec: Neural Spatial Speech Coding
Zhongweiyang Xu
Yong-mei Xu
Vinay Kothapally
Heming Wang
Muqiao Yang
Dong Yu
46
1
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
102
63
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
133
69
0
14 Sep 2023
AudioSR: Versatile Audio Super-resolution at Scale
Haohe Liu
Ke Chen
Qiao Tian
Wenwu Wang
Mark D. Plumbley
DiffM
51
25
0
13 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
94
8
0
13 Sep 2023
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Xiaoxiao Miao
Xin Eric Wang
Erica Cooper
Junichi Yamagishi
Nicholas W. D. Evans
Massimiliano Todisco
J. Bonastre
Mickael Rouvier
62
5
0
12 Sep 2023
Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Xin Wang
Junichi Yamagishi
SyDa
109
29
0
12 Sep 2023
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Jinzuomu Zhong
Yang Li
Hui Huang
Korin Richmond
Jie Liu
Zhiba Su
Jing Guo
Benlai Tang
Fengjie Zhu
71
1
0
11 Sep 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Yiwei Guo
Chenpeng Du
Ziyang Ma
Xie Chen
K. Yu
DiffM
115
47
0
10 Sep 2023
Exploring Domain-Specific Enhancements for a Neural Foley Synthesizer
Ashwin Pillay
Sage J. Betko
Ari Liloia
Hao Chen
Ankit Shah
20
0
0
08 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
71
2
0
08 Sep 2023
A Two-Stage Training Framework for Joint Speech Compression and Enhancement
Jiayi Huang
Zeyu Yan
Wenbin Jiang
Fei Wen
71
1
0
08 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
129
96
0
06 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
84
11
0
06 Sep 2023
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals
Yiming Wu
CoGe
DRL
115
0
0
06 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
81
2
0
06 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
75
2
0
06 Sep 2023
FSD: An Initial Chinese Dataset for Fake Song Detection
Yuankun Xie
Jingjing Zhou
Xiaolin Lu
Zhenghao Jiang
Yuxin Yang
Haonan Cheng
Long Ye
90
15
0
05 Sep 2023
Timbre-reserved Adversarial Attack in Speaker Identification
Qing Wang
Jixun Yao
Li Zhang
Pengcheng Guo
Linfu Xie
AAML
91
4
0
02 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
96
9
0
02 Sep 2023
The FruitShell French synthesis system at the Blizzard 2023 Challenge
Xin Qi
Xiaopeng Wang
Zhiyong Wang
Wang Liu
Mingming Ding
Shuchen Shi
30
1
0
01 Sep 2023
Previous
1
2
3
...
11
12
13
...
22
23
24
Next