ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXivPDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,102 papers shown
Title
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization
Sotheara Leang
Anderson Augusma
E. Castelli
Frédérique Letué
Sethserey Sam
Dominique Vaufreydaz
34
0
0
24 Sep 2024
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Krithiga Ramadass
Abrit Pal Singh
Srihari J
Sheetal Kalyani
VLM
31
0
0
24 Sep 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffM
VLM
52
4
0
24 Sep 2024
Voice Conversion-based Privacy through Adversarial Information Hiding
Voice Conversion-based Privacy through Adversarial Information Hiding
J. Webber
O. Watts
G. Henter
Jennifer Williams
Simon King
45
0
0
23 Sep 2024
HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters
HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters
Lauri Juvela
Pablo Pérez Zarazaga
G. Henter
Zofia Malisz
32
0
0
23 Sep 2024
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
Hieu-Thi Luong
Haoyang Li
Lin Zhang
Kong Aik Lee
Eng Siong Chng
64
2
0
23 Sep 2024
Video-to-Audio Generation with Fine-grained Temporal Semantics
Video-to-Audio Generation with Fine-grained Temporal Semantics
Yuchen Hu
Yu Gu
Chenxing Li
Rilin Chen
Dong Yu
VGen
DiffM
29
1
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
56
3
0
23 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
38
4
0
22 Sep 2024
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech
  Synthesis
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
Lauri Juvela
Xin Eric Wang
34
3
0
20 Sep 2024
MuCodec: Ultra Low-Bitrate Music Codec
MuCodec: Ultra Low-Bitrate Music Codec
Yaoxun Xu
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Shun Lei
Zhiwei Lin
Zhiyong Wu
35
1
0
20 Sep 2024
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Yu Zhang
Changhao Pan
Wenxiang Guo
Ruiqi Li
Zehan Zhu
...
Yuxin Chen
Chen Yang
Jiecheng Zhou
Xinyu Cheng
Zhou Zhao
26
6
0
20 Sep 2024
Speech-Declipping Transformer with Complex Spectrogram and Learnerble
  Temporal Features
Speech-Declipping Transformer with Complex Spectrogram and Learnerble Temporal Features
Younghoo Kwon
Jung-Woo Choi
30
2
0
19 Sep 2024
NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis
NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis
Romeo Lanzino
Federico Fontana
Luigi Cinque
Francesco Scarcello
Atsuto Maki
MedIm
34
3
0
18 Sep 2024
Simulating Native Speaker Shadowing for Nonnative Speech Assessment with
  Latent Speech Representations
Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations
Haopeng Geng
Daisuke Saito
Nobuaki Minematsu
25
0
0
18 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
68
1
0
18 Sep 2024
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition
  on Low-Resource Accented Speech Corpora
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
F. Nespoli
Daniel Barreda
Patrick A. Naylor
28
1
0
17 Sep 2024
High-Resolution Speech Restoration with Latent Diffusion Model
High-Resolution Speech Restoration with Latent Diffusion Model
Tushar Dhyani
Florian Lux
Michele Mancusi
Giorgio Fabbro
Fritz Hohl
Ngoc Thang Vu
DiffM
37
0
0
17 Sep 2024
FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic
  Music Generated via Text-to-Music Models
FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models
Luca Comanducci
Paolo Bestagini
Stefano Tubaro
35
7
0
16 Sep 2024
Improving Spoken Language Modeling with Phoneme Classification: A Simple
  Fine-tuning Approach
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
42
2
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
5
0
16 Sep 2024
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for
  Target Style Audio Generation
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Chenxu Xiong
Ruibo Fu
Shuchen Shi
Zhengqi Wen
Jianhua Tao
...
Chunyu Qiang
Yuankun Xie
Xin Qi
Guanjun Li
Zizheng Yang
DiffM
31
0
0
14 Sep 2024
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue
Shuai Wang
Wanxing Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
36
1
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
194
0
0
14 Sep 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wenyuan Xu
33
14
0
14 Sep 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
Henry Li Xinyuan
Zexin Cai
Ashi Garg
Kevin Duh
Leibny Paola García-Perera
Sanjeev Khudanpur
Nicholas Andrews
Matthew Wiesner
37
3
0
13 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
40
2
0
13 Sep 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for
  Prosody Modeling
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
36
0
0
13 Sep 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
48
7
0
13 Sep 2024
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme
  representations
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
Wangjin Zhou
Fengrun Zhang
Yiming Liu
Wenhao Guan
Yi Zhao
He Qu
25
1
0
12 Sep 2024
LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking
LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking
Mayank Kumar Singh
Naoya Takahashi
Wei-Hsiang Liao
Yuki Mitsufuji
50
1
0
12 Sep 2024
ManaTTS Persian: a recipe for creating TTS datasets for lower resource
  languages
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
Mahta Fetrat Qharabagh
Zahra Dehghanian
Hamid R. Rabiee
21
2
0
11 Sep 2024
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in
  New Paradigm
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
Yuning Wu
Jiatong Shi
Yifeng Yu
Yuxun Tang
Tao Qian
Yueqian Lin
Jionghao Han
Xinyi Bai
Shinji Watanabe
Qin Jin
37
3
0
11 Sep 2024
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Wen-Chin Huang
Szu-Wei Fu
Erica Cooper
Ryandhimas E. Zezario
T. Toda
Hsin-Min Wang
Junichi Yamagishi
Yu Tsao
32
5
0
11 Sep 2024
InstructSing: High-Fidelity Singing Voice Generation via Instructing
  Yourself
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
Chang Zeng
Chunhui Wang
Xiaoxiao Miao
Jian Zhao
Zhonglin Jiang
Yong Chen
41
0
0
10 Sep 2024
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for
  Robust Singing Voice Conversion
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Wei Chen
Xintao Zhao
Jun Chen
Binzhu Sha
Zhiwei Lin
Zhiyong Wu
47
0
0
10 Sep 2024
Multi-Source Music Generation with Latent Diffusion
Multi-Source Music Generation with Latent Diffusion
Zhongweiyang Xu
Debottam Dutta
Yu-Lin Wei
Romit Roy Choudhury
DiffM
45
1
0
10 Sep 2024
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and
  Voice Conversion
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Kyungguen Byun
Jason Filos
Erik Visser
Sunkuk Moon
42
0
0
10 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
33
30
0
10 Sep 2024
Estimating the Completeness of Discrete Speech Units
Estimating the Completeness of Discrete Speech Units
Sung-Lin Yeh
Hao Tang
36
1
0
09 Sep 2024
Vector Quantized Diffusion Model Based Speech Bandwidth Extension
Vector Quantized Diffusion Model Based Speech Bandwidth Extension
Yuan Fang
Jinglin Bai
Jiajie Wang
Xueliang Zhang
27
0
0
09 Sep 2024
Investigating Neural Audio Codecs for Speech Language Model-Based Speech
  Generation
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Jiaqi Li
Dongmei Wang
Xiaofei Wang
Yao Qian
Long Zhou
...
Junkun Chen
Sheng Zhao
Jinyu Li
Zhizheng Wu
Michael Zeng
AuLLM
35
3
0
06 Sep 2024
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving
  Speaker Anonymization
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
Zexin Cai
Henry Li Xinyuan
Ashi Garg
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Nicholas Andrews
Matthew Wiesner
34
2
0
05 Sep 2024
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with
  Adversarial Conditional Diffusion Distillation
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Yuto Kondo
DiffM
40
0
0
03 Sep 2024
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
35
0
0
03 Sep 2024
USTC-KXDIGIT System Description for ASVspoof5 Challenge
USTC-KXDIGIT System Description for ASVspoof5 Challenge
Y. Chen
Haochen Wu
Nan Jiang
Xiang Xia
Qing Gu
...
Sian Fang
Yan Song
Wu Guo
Lin Liu
Minqiang Xu
43
1
0
03 Sep 2024
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure
  Transformer Blocks and Triplet Discriminative Training
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Zedong Xing
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
21
0
0
03 Sep 2024
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for
  Taiwanese Hakka
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
Li-Wei Chen
Hung-Shin Lee
Chen-Chi Chang
VLM
32
0
0
03 Sep 2024
Spectron: Target Speaker Extraction using Conditional Transformer with
  Adversarial Refinement
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement
Tathagata Bandyopadhyay
ViT
18
0
0
02 Sep 2024
FLUX that Plays Music
FLUX that Plays Music
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
84
7
0
01 Sep 2024
Previous
12345...212223
Next