ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
v1v2 (latest)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXiv (abs)PDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,154 papers shown
Title
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure
  Transformer Blocks and Triplet Discriminative Training
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Zedong Xing
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
75
0
0
03 Sep 2024
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for
  Taiwanese Hakka
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
Li-Wei Chen
Hung-Shin Lee
Chen-Chi Chang
VLM
151
0
0
03 Sep 2024
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
116
0
0
03 Sep 2024
Spectron: Target Speaker Extraction using Conditional Transformer with
  Adversarial Refinement
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement
Tathagata Bandyopadhyay
ViT
97
0
0
02 Sep 2024
FLUX that Plays Music
FLUX that Plays Music
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
143
9
0
01 Sep 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
460
1
0
30 Aug 2024
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
A. R. Bargum
Simon Lajboschitz
Cumhur Erkut
87
1
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
158
45
0
29 Aug 2024
SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge
SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge
You Zhang
Yongyi Zang
Jiatong Shi
Ryuichi Yamamoto
Tomoki Toda
Zhiyao Duan
98
9
0
28 Aug 2024
Improving Generalization of Speech Separation in Real-World Scenarios:
  Strategies in Simulation, Optimization, and Evaluation
Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
Kai Chen
Jiaqi Su
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Zeyu Jin
81
2
0
28 Aug 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Octavian Pascu
Dan Oneaţă
H. Cucu
Nicolas M. Muller
109
1
0
28 Aug 2024
Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact
  on Empathy, Trust, and Feeling Heard
Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard
Wonjune Kang
Margaret Hughes
Deb Roy
95
1
0
26 Aug 2024
Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities
Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities
Yidi Li
Yihan Li
Yixin Guo
Bin Ren
Zhenhuan Xu
Hao Guo
Hong Liu
N. Sebe
163
0
0
26 Aug 2024
SpeechPrompt: Prompting Speech Language Models for Speech Processing
  Tasks
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Kai-Wei Chang
Haibin Wu
Yu-Kai Wang
Yuan-Kuei Wu
Hua Shen
Wei-Cheng Tseng
Iu-thing Kang
Shang-Wen Li
Hung-yi Lee
95
3
0
23 Aug 2024
Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani
  Classical Music
Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music
N. Shikarpur
Krishna Maneesha Dendukuri
Yusong Wu
Antoine Caillon
Cheng-Zhi Anna Huang
37
1
0
22 Aug 2024
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion
  of Whispered and Regular Speech
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
82
0
0
21 Aug 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
88
1
0
20 Aug 2024
DisMix: Disentangling Mixtures of Musical Instruments for Source-level
  Pitch and Timbre Manipulation
DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Yin-Jyun Luo
K. Cheuk
Woosung Choi
Toshimitsu Uesaka
Keisuke Toyama
...
Chieh-Hsin Lai
Yuhta Takida
Wei-Hsiang Liao
Simon Dixon
Yuki Mitsufuji
CoGe
114
2
0
20 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee
Yoori Oh
Injune Hwang
Kyogu Lee
CVBM
59
3
0
19 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks
  at Scale
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
99
50
0
16 Aug 2024
Accelerating High-Fidelity Waveform Generation via Adversarial Flow
  Matching Optimization
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
AI4TS
95
2
0
15 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OODDiffMAI4TS
116
6
0
14 Aug 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis
  Vocoders
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
64
0
0
13 Aug 2024
An Investigation Into Explainable Audio Hate Speech Detection
An Investigation Into Explainable Audio Hate Speech Detection
Jinmyeong An
Wonjun Lee
Yejin Jeon
Jungseul Ok
Yunsu Kim
Gary Geunbae Lee
70
2
0
12 Aug 2024
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Xiaoxiao Miao
Yuxiang Zhang
Xin Wang
N. Tomashenko
D. Soh
Ian Mcloughlin
121
2
0
12 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
99
5
0
09 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
64
2
0
08 Aug 2024
Synchronous Multi-modal Semantic Communication System with Packet-level
  Coding
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Yun Tian
Jingkai Ying
Zhijin Qin
Ye Jin
Xiaoming Tao
79
6
0
08 Aug 2024
MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy Risks and
  Maximizing Utility in Audio-Visual Data Archiving
MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy Risks and Maximizing Utility in Audio-Visual Data Archiving
B. Owoyele
Martin Schilling
Rohan Sawahn
Niklas Kaemer
Pavel Zherebenkov
Bhuvanesh Verma
Wim Pouw
Gerard de Melo
109
0
0
06 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
143
0
0
06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
77
0
0
05 Aug 2024
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
Yansheng Li
Tingzhu Wang
Kang Wu
Linlin Wang
Xin Guo
Wenbin Wang
107
0
0
27 Jul 2024
Speech Bandwidth Expansion Via High Fidelity Generative Adversarial
  Networks
Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks
Mahmoud Salhab
H. Harmanani
31
0
0
26 Jul 2024
Towards Improving NAM-to-Speech Synthesis Intelligibility using
  Self-Supervised Speech Models
Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
N. Shah
Shirish S. Karande
Vineet Gandhi
72
1
0
26 Jul 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
81
1
0
24 Jul 2024
Distortion Recovery: A Two-Stage Method for Guitar Effect Removal
Distortion Recovery: A Two-Stage Method for Guitar Effect Removal
Ying-Shuo Lee
Yueh-Po Peng
Jui-Te Wu
Ming Cheng
Li Su
Yi-Hsuan Yang
72
1
0
23 Jul 2024
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue
  Language Modeling
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
Wataru Nakata
Kentaro Seki
Hitomi Yanaka
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
AuLLM
74
2
0
22 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
118
6
0
22 Jul 2024
Learning Physics for Unveiling Hidden Earthquake Ground Motions via
  Conditional Generative Modeling
Learning Physics for Unveiling Hidden Earthquake Ground Motions via Conditional Generative Modeling
Pu Ren
R. Nakata
Maxime Lacour
Ilan Naiman
Nori Nakata
...
Osman Asif Malik
Dmitriy Morozov
Omri Azencot
N. Benjamin Erichson
Michael W. Mahoney
AI4CE
79
9
0
21 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning
  of CLIP and Fastspeech2
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
87
0
0
19 Jul 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for
  Practical Applications through Low-Effort Data Strategies
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
Srija Anand
Praveena Varadhan
Ashwin Sankar
Giri Raju
Mitesh M. Khapra
61
2
0
18 Jul 2024
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural
  Network
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
Kexin Wang
Jiahong Zhang
Yong Ren
Man Yao
Richard D. Shang
Boxing Xu
Guoqi Li
DiffM
76
2
0
17 Jul 2024
A Language Modeling Approach to Diacritic-Free Hebrew TTS
A Language Modeling Approach to Diacritic-Free Hebrew TTS
Amit Roth
A. Turetzky
Yossi Adi
91
3
0
16 Jul 2024
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio
  Synthesis
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis
Weizhi Liu
Yue Li
Dongdong Lin
Hui Tian
Haizhou Li
WIGM
113
10
0
15 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen M. Meng
Furu Wei
182
43
0
11 Jul 2024
Video-to-Audio Generation with Hidden Alignment
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffMVGen
121
12
0
10 Jul 2024
Analyzing Speech Unit Selection for Textless Speech-to-Speech
  Translation
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
J. Duret
Yannick Esteve
Titouan Parcollet
116
0
0
08 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGenDiffM
93
15
0
08 Jul 2024
A Benchmark for Multi-speaker Anonymization
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
107
1
0
08 Jul 2024
Fine-Grained and Interpretable Neural Speech Editing
Fine-Grained and Interpretable Neural Speech Editing
Max Morrison
Cameron Churchwell
Nathan Pruyne
Bryan Pardo
91
3
0
07 Jul 2024
Previous
123...567...222324
Next