ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
v1v2 (latest)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXiv (abs)PDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,154 papers shown
Title
Improving curriculum learning for target speaker extraction with
  synthetic speakers
Improving curriculum learning for target speaker extraction with synthetic speakers
Yun Liu
Xuechen Liu
Junichi Yamagishi
76
0
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
213
26
0
01 Oct 2024
MIMII-Gen: Generative Modeling Approach for Simulated Evaluation of
  Anomalous Sound Detection System
MIMII-Gen: Generative Modeling Approach for Simulated Evaluation of Anomalous Sound Detection System
Harsh Purohit
Tomoya Nishida
Kota Dohi
Takashi Endo
Yohei Kawaguchi
DiffM
70
1
0
27 Sep 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in
  Any-to-One Voice Conversion
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
77
1
0
25 Sep 2024
Generative Speech Foundation Model Pretraining for High-Quality Speech
  Extraction and Restoration
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Pin-Jui Ku
Alexander H. Liu
Roman Korostik
Sung-Feng Huang
Szu-Wei Fu
Ante Jukić
81
4
0
24 Sep 2024
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization
Sotheara Leang
Anderson Augusma
E. Castelli
Frédérique Letué
Sethserey Sam
Dominique Vaufreydaz
61
0
0
24 Sep 2024
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Krithiga Ramadass
Abrit Pal Singh
Srihari J
Sheetal Kalyani
VLM
60
0
0
24 Sep 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffMVLM
192
8
0
24 Sep 2024
Voice Conversion-based Privacy through Adversarial Information Hiding
Voice Conversion-based Privacy through Adversarial Information Hiding
J. Webber
O. Watts
G. Henter
Jennifer Williams
Simon King
81
0
0
23 Sep 2024
HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters
HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters
Lauri Juvela
Pablo Pérez Zarazaga
G. Henter
Zofia Malisz
60
0
0
23 Sep 2024
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
Hieu-Thi Luong
Haoyang Li
Lin Zhang
Kong Aik Lee
Eng Siong Chng
100
4
0
23 Sep 2024
Video-to-Audio Generation with Fine-grained Temporal Semantics
Video-to-Audio Generation with Fine-grained Temporal Semantics
Yuchen Hu
Yu Gu
Chenxing Li
Rilin Chen
Dong Yu
VGenDiffM
89
1
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
133
5
0
23 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffMSSL
110
5
0
22 Sep 2024
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech
  Synthesis
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
Lauri Juvela
Xin Eric Wang
98
4
0
20 Sep 2024
MuCodec: Ultra Low-Bitrate Music Codec
MuCodec: Ultra Low-Bitrate Music Codec
Yaoxun Xu
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Shun Lei
Zhiwei Lin
Zhiyong Wu
73
3
0
20 Sep 2024
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Yu Zhang
Changhao Pan
Wenxiang Guo
Ruiqi Li
Zehan Zhu
...
Yuxin Chen
Chen Yang
Jiecheng Zhou
Xinyu Cheng
Zhou Zhao
92
10
0
20 Sep 2024
Speech-Declipping Transformer with Complex Spectrogram and Learnerble
  Temporal Features
Speech-Declipping Transformer with Complex Spectrogram and Learnerble Temporal Features
Younghoo Kwon
Jung-Woo Choi
123
2
0
19 Sep 2024
NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis
NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis
Romeo Lanzino
Federico Fontana
Luigi Cinque
Francesco Scarcello
Atsuto Maki
MedIm
66
4
0
18 Sep 2024
Simulating Native Speaker Shadowing for Nonnative Speech Assessment with
  Latent Speech Representations
Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations
Haopeng Geng
Daisuke Saito
Nobuaki Minematsu
83
0
0
18 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
162
4
0
18 Sep 2024
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition
  on Low-Resource Accented Speech Corpora
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
F. Nespoli
Daniel Barreda
Patrick A. Naylor
67
1
0
17 Sep 2024
High-Resolution Speech Restoration with Latent Diffusion Model
High-Resolution Speech Restoration with Latent Diffusion Model
Tushar Dhyani
Florian Lux
Michele Mancusi
Giorgio Fabbro
Fritz Hohl
Ngoc Thang Vu
DiffM
147
0
0
17 Sep 2024
FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic
  Music Generated via Text-to-Music Models
FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models
Luca Comanducci
Paolo Bestagini
Stefano Tubaro
71
7
0
16 Sep 2024
Improving Spoken Language Modeling with Phoneme Classification: A Simple
  Fine-tuning Approach
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
84
3
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
114
5
0
16 Sep 2024
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for
  Target Style Audio Generation
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Chenxu Xiong
Ruibo Fu
Shuchen Shi
Zhengqi Wen
Jianhua Tao
...
Chunyu Qiang
Yuankun Xie
Xin Qi
Guanjun Li
Zizheng Yang
DiffM
82
0
0
14 Sep 2024
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue
Shuai Wang
Wanxing Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
124
2
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
471
0
0
14 Sep 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wei Dong
89
16
0
14 Sep 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
Lin Zhang
Zexin Cai
Ashi Garg
Kevin Duh
Leibny Paola García-Perera
Sanjeev Khudanpur
Nicholas Andrews
Sanjeev Khudanpur
50
4
0
13 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
66
2
0
13 Sep 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for
  Prosody Modeling
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
168
0
0
13 Sep 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGenDiffM
103
9
0
13 Sep 2024
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme
  representations
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
Wangjin Zhou
Fengrun Zhang
Yiming Liu
Wenhao Guan
Yi Zhao
He Qu
51
2
0
12 Sep 2024
LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking
LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking
Mayank Kumar Singh
Naoya Takahashi
Wei-Hsiang Liao
Yuki Mitsufuji
82
1
0
12 Sep 2024
ManaTTS Persian: a recipe for creating TTS datasets for lower resource
  languages
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
Mahta Fetrat Qharabagh
Zahra Dehghanian
Hamid R. Rabiee
72
2
0
11 Sep 2024
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in
  New Paradigm
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
Yuning Wu
Jiatong Shi
Yifeng Yu
Yuxun Tang
Tao Qian
Yueqian Lin
Jionghao Han
Xinyi Bai
Shinji Watanabe
Qin Jin
86
3
0
11 Sep 2024
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Wen-Chin Huang
Szu-Wei Fu
Erica Cooper
Ryandhimas E. Zezario
Tomoki Toda
Hsin-Min Wang
Junichi Yamagishi
Yu Tsao
91
12
0
11 Sep 2024
InstructSing: High-Fidelity Singing Voice Generation via Instructing
  Yourself
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
Chang Zeng
Chunhui Wang
Xiaoxiao Miao
Jian Zhao
Zhonglin Jiang
Yong Chen
71
0
0
10 Sep 2024
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for
  Robust Singing Voice Conversion
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Wei Chen
Xintao Zhao
Jun Chen
Binzhu Sha
Zhiwei Lin
Zhiyong Wu
100
1
0
10 Sep 2024
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and
  Voice Conversion
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Kyungguen Byun
Jason Filos
Erik Visser
Sunkuk Moon
80
0
0
10 Sep 2024
Multi-Source Music Generation with Latent Diffusion
Multi-Source Music Generation with Latent Diffusion
Zhongweiyang Xu
Debottam Dutta
Yu-Lin Wei
Romit Roy Choudhury
DiffM
129
2
0
10 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
123
54
0
10 Sep 2024
Estimating the Completeness of Discrete Speech Units
Estimating the Completeness of Discrete Speech Units
Sung-Lin Yeh
Hao Tang
113
2
0
09 Sep 2024
Vector Quantized Diffusion Model Based Speech Bandwidth Extension
Vector Quantized Diffusion Model Based Speech Bandwidth Extension
Yuan Fang
Jinglin Bai
Jiajie Wang
Xueliang Zhang
78
1
0
09 Sep 2024
Investigating Neural Audio Codecs for Speech Language Model-Based Speech
  Generation
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Jiaqi Li
Dongmei Wang
Xiaofei Wang
Yao Qian
Long Zhou
...
Junkun Chen
Sheng Zhao
Jinyu Li
Zhizheng Wu
Michael Zeng
AuLLM
96
3
0
06 Sep 2024
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving
  Speaker Anonymization
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
Zexin Cai
Lin Zhang
Ashi Garg
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Nicholas Andrews
Sanjeev Khudanpur
47
3
0
05 Sep 2024
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with
  Adversarial Conditional Diffusion Distillation
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Yuto Kondo
DiffM
80
2
0
03 Sep 2024
USTC-KXDIGIT System Description for ASVspoof5 Challenge
USTC-KXDIGIT System Description for ASVspoof5 Challenge
Y. Chen
Haochen Wu
Nan Jiang
Xiang Xia
Qing Gu
...
Sian Fang
Yan Song
Wu Guo
Lin Liu
Minqiang Xu
89
1
0
03 Sep 2024
Previous
123456...222324
Next