ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXivPDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,107 papers shown
Title
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for
  Text-to-Speech -- A Study between English and Mandarin
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
57
8
0
02 Sep 2023
The FruitShell French synthesis system at the Blizzard 2023 Challenge
The FruitShell French synthesis system at the Blizzard 2023 Challenge
Xin Qi
Xiaopeng Wang
Zhiyong Wang
Wang Liu
Mingming Ding
Shuchen Shi
18
1
0
01 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via
  Vector-Quantized Self-Supervised Speech Representation Learning
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
48
3
0
31 Aug 2023
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
  Models
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Xin Zhang
Dong Zhang
Shimin Li
Yaqian Zhou
Xipeng Qiu
44
64
0
31 Aug 2023
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for
  Conversational Text-to-Speech Synthesis
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis
Weiqin Li
Shunwei Lei
Qiaochu Huang
Yixuan Zhou
Zhiyong Wu
Shiyin Kang
Helen Meng
32
4
0
31 Aug 2023
Improving Mandarin Prosodic Structure Prediction with Multi-level
  Contextual Information
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
Jing Chen
Chang Song
Deyi Tuo
Xixin Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
15
1
0
31 Aug 2023
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Jing Chen
Xingcheng Song
Zhendong Peng
Binbin Zhang
Fuping Pan
Zhiyong Wu
DiffM
35
16
0
31 Aug 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive
  Text-to-Speech Synthesis
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Yi Meng
Xiang Li
Zhiyong Wu
Tingtian Li
Zixun Sun
Xinyu Xiao
Chi Sun
Hui Zhan
Helen Meng
29
0
0
30 Aug 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
30
1
0
30 Aug 2023
A Review of Differentiable Digital Signal Processing for Music & Speech
  Synthesis
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
B. Hayes
Jordie Shier
Gyorgy Fazekas
Andrew Mcpherson
C. Saitis
32
21
0
29 Aug 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent
  Videos
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
39
5
0
29 Aug 2023
Audio Deepfake Detection: A Survey
Audio Deepfake Detection: A Survey
Jiangyan Yi
Chenglong Wang
J. Tao
Xiaohui Zhang
Chu Yuan Zhang
Yan Zhao
43
45
0
29 Aug 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon
Changhwan Kim
Eunwoo Song
Hyun-Wook Yoon
Hong-Goo Kang
42
1
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised
  representations
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
53
3
0
28 Aug 2023
Expressive paragraph text-to-speech synthesis with multi-step
  variational autoencoder
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Xuyuan Li
Zengqiang Shang
Peiyang Shi
Hua Hua
Jian Liu
Pengyuan Zhang
45
0
0
25 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
40
38
0
24 Aug 2023
Exploiting Time-Frequency Conformers for Music Audio Enhancement
Exploiting Time-Frequency Conformers for Music Audio Enhancement
Yunkee Chae
Junghyun Koo
Sungho Lee
Kyogu Lee
40
3
0
24 Aug 2023
AdVerb: Visually Guided Audio Dereverberation
AdVerb: Visually Guided Audio Dereverberation
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
30
11
0
23 Aug 2023
Audio Generation with Multiple Conditional Diffusion Model
Audio Generation with Multiple Conditional Diffusion Model
Zhifang Guo
Jianguo Mao
Ruijie Tao
Long Yan
Kazushige Ouchi
Hong Liu
Xiangdong Wang
DiffM
26
11
0
23 Aug 2023
Evaluation of the Speech Resynthesis Capabilities of the VoicePrivacy
  Challenge Baseline B1
Evaluation of the Speech Resynthesis Capabilities of the VoicePrivacy Challenge Baseline B1
Ünal Ege Gaznepoglu
Nils Peters
39
0
0
22 Aug 2023
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice
  Conversion
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion
Yimin Deng
Huaizhen Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
36
7
0
21 Aug 2023
Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based
  Singing Technique Conversion
Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion
Tung-Cheng Su
Yung-Chuan Chang
Yi-Wen Liu
21
0
0
19 Aug 2023
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided
  Speaker Embedding
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
J. Choi
Joanna Hong
Y. Ro
DiffM
29
19
0
15 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using
  1D-2D CNN
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
38
4
0
14 Aug 2023
BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music
  Super-Resolution
BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music Super-Resolution
Yenan Zhang
Hiroshi Watanabe
24
0
0
12 Aug 2023
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Siyuan Shan
Yang Li
A. Banerjee
Junier B. Oliva
32
4
0
11 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
47
224
0
10 Aug 2023
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGen
VLM
DiffM
72
37
0
09 Aug 2023
A Systematic Exploration of Joint-training for Singing Voice Synthesis
A Systematic Exploration of Joint-training for Singing Voice Synthesis
Yuning Wu
Yifeng Yu
Jiatong Shi
Tao Qian
Qin Jin
51
5
0
05 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
47
10
0
03 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual
  Discriminators for High-Fidelity Multi-Speaker TTS
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
30
1
0
03 Aug 2023
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using
  Beat-Synchronous Mixup Strategies
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
K. Chen
Yusong Wu
Haohe Liu
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
DiffM
44
75
0
03 Aug 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
27
21
0
02 Aug 2023
SALTTS: Leveraging Self-Supervised Speech Representations for improved
  Text-to-Speech Synthesis
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
Ramanan Sivaguru
Vasista Sai Lodagala
S. Umesh
29
2
0
02 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
38
1
0
31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive
  Speech Synthesis with Prosody Conditional Adversarial Training
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
33
14
0
31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech
  with Adversarial Learning and Architecture Design
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
37
37
0
31 Jul 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
30
9
0
30 Jul 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context
  Information for Expressive Speech Synthesis
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
40
7
0
29 Jul 2023
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based
  onFlexible Location Gradient Reversal Layer
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer
Md. Asif Jalal
Pablo Peso Parada
Jisi Zhang
Karthikeyan P. Saravanan
Mete Ozay
Myoungji Han
Jung In Lee
Seokyeong Jung
28
1
0
25 Jul 2023
CQNV: A combination of coarsely quantized bitstream and neural vocoder
  for low rate speech coding
CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Youqiang Zheng
Li Xiao
Weiping Tu
Yuhong Yang
Xinmeng Xu
43
6
0
25 Jul 2023
Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency
  of Full-band Magnitude and Phase
Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase
Yoshiki Masuyama
Natsuki Ueno
Nobutaka Ono
32
1
0
23 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
25
2
0
20 Jul 2023
An analysis on the effects of speaker embedding choice in non
  auto-regressive TTS
An analysis on the effects of speaker embedding choice in non auto-regressive TTS
Adriana Stan
Johannah O'Mahony
43
0
0
19 Jul 2023
Vocoder drift compensation by x-vector alignment in speaker
  anonymisation
Vocoder drift compensation by x-vector alignment in speaker anonymisation
Michele Panariello
Massimiliano Todisco
Nicholas W. D. Evans
42
2
0
17 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
46
45
0
14 Jul 2023
Rhythm Modeling for Voice Conversion
Rhythm Modeling for Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Herman Kamper
42
5
0
12 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
55
6
0
11 Jul 2023
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023
  Speech-to-Speech Translation Task
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Kun Song
Yinjiao Lei
Pei-Ning Chen
Yiqing Cao
Kun Wei
Yongmao Zhang
Linfu Xie
Ning Jiang
Guoqing Zhao
29
1
0
10 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic
  Literature Review
The Ethical Implications of Generative Audio Models: A Systematic Literature Review
J. Barnett
34
25
0
07 Jul 2023
Previous
123...111213...212223
Next