ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
v1v2 (latest)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXiv (abs)PDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,154 papers shown
Title
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New
  Speakers
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
106
18
0
01 Nov 2022
Waveform Boundary Detection for Partially Spoofed Audio
Waveform Boundary Detection for Partially Spoofed Audio
Zexin Cai
Weiqing Wang
Ming Li
53
28
0
01 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
91
0
0
31 Oct 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker
  TTS with Accents
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents
Yongmao Zhang
Zhichao Wang
Pei-Yin Yang
Hongshen Sun
Zhisheng Wang
Linfu Xie
90
6
0
31 Oct 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
84
2
0
31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis
The Importance of Accurate Alignments in End-to-End Speech Synthesis
Anusha Prakash
H. Murthy
48
7
0
31 Oct 2022
Magnitude or Phase? A Two Stage Algorithm for Dereverberation
Magnitude or Phase? A Two Stage Algorithm for Dereverberation
Ayal Schwartz
Sharon Gannot
Shlomo E. Chazan
84
0
0
31 Oct 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
106
13
0
31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context
  conditioning, utterance embeddings, and reference encoders
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
57
0
0
28 Oct 2022
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit
Ryuichi Yamamoto
Reo Yoneyama
Tomoki Toda
474
12
0
28 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band
  Generation and Inverse Short-Time Fourier Transform
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Masaya Kawamura
Yuma Shirahata
Ryuichi Yamamoto
Kentaro Tachibana
101
17
0
28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for
  End-to-end Emotional Speech Synthesis
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
86
11
0
28 Oct 2022
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural
  Vocoder
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder
Reo Yoneyama
Yi-Chiao Wu
Tomoki Toda
111
27
0
27 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Jingyi Li
Weiping Tu
Li Xiao
134
113
0
27 Oct 2022
Explicit Intensity Control for Accented Text-to-speech
Explicit Intensity Control for Accented Text-to-speech
Rui Liu
Haolin Zuo
De Hu
Guanglai Gao
Haizhou Li
106
7
0
27 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive
  Conversational Speech Synthesis
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Yifan Hu
Rui Liu
Guanglai Gao
Haizhou Li
383
8
0
27 Oct 2022
Articulation GAN: Unsupervised modeling of articulatory learning
Articulation GAN: Unsupervised modeling of articulatory learning
Gašper Beguš
Alan Zhou
Peter Wu
Gopala K Anumanchipalli
GAN
140
8
0
27 Oct 2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning
  Compact Speech Representations
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Haohan Guo
Fenglong Xie
Xixin Wu
Hui Lu
Helen Meng
332
3
0
27 Oct 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data
  selection
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
100
8
0
26 Oct 2022
Full-band General Audio Synthesis with Score-based Diffusion
Full-band General Audio Synthesis with Score-based Diffusion
Santiago Pascual
Gautam Bhattacharya
Chunghsin Yeh
Jordi Pons
Joan Serrà
DiffM
69
35
0
26 Oct 2022
Cover Reproducible Steganography via Deep Generative Models
Cover Reproducible Steganography via Deep Generative Models
Kejiang Chen
Hang Zhou
Yaofei Wang
Meng Li
Weiming Zhang
Neng H. Yu
DiffM
77
13
0
26 Oct 2022
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR
  Challenge
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Yuhao Liang
Pei-Ning Chen
F. Yu
Xinfa Zhu
Tianyi Xu
Linfu Xie
63
0
0
26 Oct 2022
EBEN: Extreme bandwidth extension network applied to speech signals
  captured with noise-resilient body-conduction microphones
EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient body-conduction microphones
J. Hauret
Thomas Joubaud
V. Zimpfer
Éric Bavu
48
10
0
25 Oct 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
AI4TS
68
5
0
25 Oct 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual
  Voice Conversion Using $β$-VAE
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using βββ-VAE
Hui Lu
Disong Wang
Xixin Wu
Zhiyong Wu
Xunying Liu
Helen M. Meng
DRL
123
10
0
25 Oct 2022
High Fidelity Neural Audio Compression
High Fidelity Neural Audio Compression
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
136
674
0
24 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based
  On FullConv-TTS
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS
Ziqi Liang
60
0
0
24 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary
  Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
Chunhui Wang
Chang Zeng
Jun Chen
Xingji He
100
7
0
23 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
107
23
0
21 Oct 2022
Boomerang: Local sampling on image manifolds using diffusion models
Boomerang: Local sampling on image manifolds using diffusion models
Lorenzo Luzi
P. Mayer
Josue Casco-Rodriguez
Ali Siahkoohi
Richard G. Baraniuk
DiffM
115
20
0
21 Oct 2022
Robust One-Shot Singing Voice Conversion
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
126
8
0
20 Oct 2022
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
Chihiro Watanabe
Hirokazu Kameoka
DRL
122
0
0
20 Oct 2022
Spoofed training data for speech spoofing countermeasure can be
  efficiently created using neural vocoders
Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders
Xin Wang
Junichi Yamagishi
116
43
0
19 Oct 2022
Mid-attribute speaker generation using optimal-transport-based
  interpolation of Gaussian mixture models
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Detai Xin
Hiroshi Saruwatari
71
3
0
18 Oct 2022
Visual onoma-to-wave: environmental sound synthesis from visual
  onomatopoeias and sound-source images
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Hien Ohnaka
Shinnosuke Takamichi
Keisuke Imoto
Yuki Okamoto
Kazuki Fujii
Hiroshi Saruwatari
DiffM
80
8
0
17 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Emily R. Bartusiak
Edward J. Delp
62
14
0
14 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
82
16
0
14 Oct 2022
Anonymizing Speech with Generative Adversarial Networks to Preserve
  Speaker Privacy
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy
Sarina Meyer
Pascal Tilli
Pavel Denisov
Florian Lux
Julia Koch
Ngoc Thang Vu
85
32
0
13 Oct 2022
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score
  Fusion
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion
Yuxiang Zhang
Jingze Lu
Xingming Wang
Zhuo Li
Runqiu Xiao
Wenchao Wang
Ming Li
Pengyuan Zhang
79
5
0
13 Oct 2022
Can we use Common Voice to train a Multi-Speaker TTS system?
Can we use Common Voice to train a Multi-Speaker TTS system?
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
83
10
0
12 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
87
16
0
12 Oct 2022
JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment
  Generation via Transformer VQ-VAE
JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE
Yueh-Kao Wu
Ching-Yu Chiu
Yi-Hsuan Yang
ViT
79
15
0
12 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data
  for Zero-Shot Multi-Speaker Text-to-Speech
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
106
6
0
12 Oct 2022
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from
  Diffusion Models
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models
Matthew Baas
Herman Kamper
DiffM
88
8
0
11 Oct 2022
ConchShell: A Generative Adversarial Networks that Turns Pictures into
  Piano Music
ConchShell: A Generative Adversarial Networks that Turns Pictures into Piano Music
Wanshu Fan
Yu-Chuan Su
Yuxin Huang
GAN
38
2
0
11 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep
  Learning Era
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
115
57
0
06 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Yangfu Li
Xiaodan Lin
Jiaxin Yang
73
0
0
06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
119
19
0
05 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on
  Fixed-Point Iteration
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
123
30
0
03 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
156
309
0
30 Sep 2022
Previous
123...171819...222324
Next