ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
v1v2 (latest)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXiv (abs)PDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,154 papers shown
Title
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
K. Udagawa
Yuki Saito
Hiroshi Saruwatari
28
6
0
21 Jun 2022
WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
Yi Wang
Yi Si
44
0
0
20 Jun 2022
Identifying Source Speakers for Voice Conversion based Spoofing Attacks
  on Speaker Verification Systems
Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker Verification Systems
Danwei Cai
Zexin Cai
Ming Li
95
10
0
18 Jun 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis
  Using Linguistic and Prosodic Contexts of Dialogue History
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Yuto Nishimura
Yuki Saito
Shinnosuke Takamichi
Kentaro Tachibana
Hiroshi Saruwatari
AI4TS
75
8
0
16 Jun 2022
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Ziqian Dai
Jianwei Yu
Yan Wang
Nuo Chen
Yanyao Bian
Guangzhi Li
Deng Cai
Dong Yu
421
8
0
16 Jun 2022
End-to-End Voice Conversion with Information Perturbation
End-to-End Voice Conversion with Information Perturbation
Qicong Xie
Shan Yang
Yinjiao Lei
Linfu Xie
Jane Polak Scowcroft
70
7
0
15 Jun 2022
Streaming non-autoregressive model for any-to-many voice conversion
Streaming non-autoregressive model for any-to-many voice conversion
Ziyi Chen
Haoran Miao
Pengyuan Zhang
78
9
0
15 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
171
255
0
09 Jun 2022
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Alexander Waibel
M. Behr
Fevziye Irem Eyiokur
Dogucan Yaman
Tuan-Nam Nguyen
Carlos Mullov
Mehmet Arif Demirtas
Alperen Kantarci
Stefan Constantin
H. K. Ekenel
CVBM
69
16
0
09 Jun 2022
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
  Text-to-Speech
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech
Ziyue Jiang
Zhe Su
Zhou Zhao
Qian Yang
Yi Ren
Jinglin Liu
Zhe Ye
73
5
0
05 Jun 2022
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining
  Unsupervised and Supervised Phonetic Representations
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Chang Liu
Zhenhua Ling
Linghui Chen
75
3
0
02 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
114
40
0
30 May 2022
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech
  with Untranscribed Data
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
Sungwon Kim
Heeseung Kim
Sung-Hoon Yoon
DiffM
255
53
0
30 May 2022
SUSing: SU-net for Singing Voice Synthesis
SUSing: SU-net for Singing Voice Synthesis
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
84
13
0
24 May 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
80
28
0
20 May 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable
  Convolutions
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
88
8
0
19 May 2022
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech
  Translation
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Qianqian Dong
Fengpeng Yue
Tom Ko
Mingxuan Wang
Qibing Bai
Yu Zhang
101
16
0
18 May 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain
  Text-to-Speech
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODDVLM
197
34
0
15 May 2022
Talking Face Generation with Multilingual TTS
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song
Sanghyun Woo
Junhyeok Lee
S. Yang
Hyunjae Cho
Youseong Lee
Dongho Choi
Kang-Wook Kim
CVBM
93
22
0
13 May 2022
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation
  Generation
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation
Reo Yoneyama
Yi-Chiao Wu
Tomoki Toda
78
14
0
12 May 2022
Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
Jiachen Lian
Chunlei Zhang
Gopala Krishna Anumanchipalli
Dong Yu
58
23
0
11 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
159
221
0
09 May 2022
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Yongqian Li
Cheng Yu
Guangzhi Sun
Hua Jiang
Fanglei Sun
Weiqin Zu
Ying Wen
Yang Yang
Jun Wang
71
7
0
09 May 2022
Muskits: an End-to-End Music Processing Toolkit for Singing Voice
  Synthesis
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Jiatong Shi
Shuai Guo
Tao Qian
Nan Huo
Tomoki Hayashi
...
Xuankai Chang
Hua-Wei Li
Peter Wu
Shinji Watanabe
Qin Jin
VLM
118
27
0
09 May 2022
Parallel Synthesis for Autoregressive Speech Generation
Parallel Synthesis for Autoregressive Speech Generation
Po-Chun Hsu
Da-Rong Liu
Andy T. Liu
Hung-yi Lee
82
5
0
25 Apr 2022
SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech
SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech
Zhenhui Ye
Zhou Zhao
Yi Ren
Leilei Gan
90
28
0
25 Apr 2022
Improving Self-Supervised Learning-based MOS Prediction Networks
Improving Self-Supervised Learning-based MOS Prediction Networks
Bálint Gyires-Tóth
Csaba Zainkó
SSL
47
1
0
23 Apr 2022
Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Detai Xin
Shinnosuke Takamichi
T. Okamoto
Hisashi Kawai
Hiroshi Saruwatari
36
0
0
22 Apr 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech
  Synthesis
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Rongjie Huang
Max W. Y. Lam
Jun Wang
Jane Polak Scowcroft
Dong Yu
Yi Ren
Zhou Zhao
DiffM
78
172
0
21 Apr 2022
ContentVec: An Improved Self-Supervised Speech Representation by
  Disentangling Speakers
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
77
113
0
20 Apr 2022
Audio Deep Fake Detection System with Neural Stitching for ADD 2022
Audio Deep Fake Detection System with Neural Stitching for ADD 2022
Rui Yan
Cheng Wen
Shuran Zhou
Tingwei Guo
Wei Zou
Xiangang Li
49
24
0
19 Apr 2022
Time Domain Adversarial Voice Conversion for ADD 2022
Time Domain Adversarial Voice Conversion for ADD 2022
Cheng Wen
Tingwei Guo
Xi Tan
Rui Yan
Shuran Zhou
Chuandong Xie
Wei Zou
Xiangang Li
85
4
0
19 Apr 2022
A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture
Zhe-ming Lu
Mengnan He
Ruixiong Zhang
Caixia Gong
GAN
34
2
0
12 Apr 2022
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration
Haohe Liu
Xubo Liu
Qiuqiang Kong
Qiao Tian
Yan Zhao
DeLiang Wang
Chuanzeng Huang
Yuxuan Wang
76
59
0
12 Apr 2022
INTERSPEECH 2022 Audio Deep Packet Loss Concealment Challenge
INTERSPEECH 2022 Audio Deep Packet Loss Concealment Challenge
Lorenz Diener
Sten Sootla
Solomiya Branets
Ando Saabas
R. Aichner
Ross Cutler
75
43
0
11 Apr 2022
Correcting Mispronunciations in Speech using Spectrogram Inpainting
Correcting Mispronunciations in Speech using Spectrogram Inpainting
Talia Ben Simon
Felix Kreuk
Faten Awwad
Jacob T. Cohen
Joseph Keshet
80
2
0
07 Apr 2022
Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch
  Encoder
Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder
Juheon Lee
Hyeong-Seok Choi
Kyogu Lee
45
7
0
07 Apr 2022
FFC-SE: Fast Fourier Convolution for Speech Enhancement
FFC-SE: Fast Fourier Convolution for Speech Enhancement
Ivan Shchekotov
Pavel Andreev
Oleg Ivanov
Aibek Alanov
Dmitry Vetrov
70
24
0
06 Apr 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context
  Information for Mandarin Speech Synthesis
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Jiankun Hu
Zhiyong Wu
Shiyin Kang
Helen Meng
81
10
0
06 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End
  Lightweight Text-to-Speech
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
47
0
0
05 Apr 2022
Lip to Speech Synthesis with Visual Context Attentional GAN
Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim
Joanna Hong
Y. Ro
124
54
0
04 Apr 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker
  Adaptation in Text-to-Speech Synthesis
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Xiang Li
Lu Zhang
Zhiyong Wu
Yanyao Bian
Jane Polak Scowcroft
Helen Meng
146
23
0
03 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
118
46
0
01 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
VLMDiffM
91
69
0
01 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different
  Configurations for Speech Synthesis
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Fan Wang
Po-Chun Hsu
Da-Rong Liu
Hung-yi Lee
69
0
0
01 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
89
33
0
31 Mar 2022
SingAug: Data Augmentation for Singing Voice Synthesis with
  Cycle-consistent Training Strategy
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy
Shuai Guo
Jiatong Shi
Tao Qian
Shinji Watanabe
Qin Jin
137
13
0
31 Mar 2022
HiFi-VC: High Quality ASR-Based Voice Conversion
HiFi-VC: High Quality ASR-Based Voice Conversion
A. Kashkin
I. Karpukhin
S. Shishkin
75
6
0
31 Mar 2022
WavThruVec: Latent speech representation as intermediate features for
  neural speech synthesis
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak
Piotr Dura
Pol van Rijn
Nori Jacoby
AI4TS
142
30
0
31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to
  Speech
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
D. Lim
Sunghee Jung
Eesung Kim
95
53
0
31 Mar 2022
Previous
123...192021222324
Next