ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXivPDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,107 papers shown
Title
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
46
3
0
06 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
48
24
0
04 Oct 2023
Improving severity preservation of healthy-to-pathological voice
  conversion with global style tokens
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens
B. Halpern
Wen-Chin Huang
Lester Phillip Violeta
R.J.J.H. van Son
Tomoki Toda
40
2
0
04 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform
  Generation
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
Roi Benita
Michael Elad
Joseph Keshet
DiffM
38
7
0
02 Oct 2023
Mirror Diffusion Models for Constrained and Watermarked Generation
Mirror Diffusion Models for Constrained and Watermarked Generation
Guan-Horng Liu
T. Chen
Evangelos A. Theodorou
Molei Tao
DiffM
25
22
0
02 Oct 2023
Towards human-like spoken dialogue generation between AI agents from
  written dialogue
Towards human-like spoken dialogue generation between AI agents from written dialogue
Kentaro Mitsui
Yukiya Hono
Kei Sawada
31
13
0
02 Oct 2023
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR
  Customization
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
Alexandra Antonova
56
0
0
29 Sep 2023
ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
Wenhao Guan
Qi Su
Haodong Zhou
Shiyu Miao
Xingjia Xie
Lin Li
Q. Hong
DiffM
20
13
0
29 Sep 2023
Collaborative Watermarking for Adversarial Speech Synthesis
Collaborative Watermarking for Adversarial Speech Synthesis
Lauri Juvela
Xin Wang
34
13
0
26 Sep 2023
BiSinger: Bilingual Singing Voice Synthesis
BiSinger: Bilingual Singing Voice Synthesis
Huali Zhou
Yueqian Lin
Yao Shi
Peng Sun
Ming Li
31
5
0
25 Sep 2023
VoiceLDM: Text-to-Speech with Environmental Context
VoiceLDM: Text-to-Speech with Environmental Context
Yeong-Won Lee
In-won Yeon
Juhan Nam
Joon Son Chung
VLM
DiffM
27
10
0
24 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice
  Synthesizer Trained on Monolingual Singers
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
Xintong Wang
Chang Zeng
Jun Chen
Chunhui Wang
27
6
0
22 Sep 2023
The Impact of Silence on Speech Anti-Spoofing
The Impact of Silence on Speech Anti-Spoofing
Yuxiang Zhang
Zhuo Li
Jingze Lu
Hua Hua
Wenchao Wang
Pengyuan Zhang
42
19
0
21 Sep 2023
FluentEditor: Text-based Speech Editing by Considering Acoustic and
  Prosody Consistency
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency
Rui Liu
Jiatian Xi
Ziyue Jiang
Haizhou Li
34
3
0
21 Sep 2023
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation
  with Consistency Distillation
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Yatong Bai
Trung D. Q. Dang
Dung N. Tran
K. Koishida
Somayeh Sojoudi
DiffM
52
22
0
19 Sep 2023
Speeding Up Speech Synthesis In Diffusion Models By Reducing Data
  Distribution Recovery Steps Via Content Transfer
Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer
Peter Ochieng
DiffM
33
0
0
18 Sep 2023
Electrolaryngeal Speech Intelligibility Enhancement Through Robust
  Linguistic Encoders
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
Lester Phillip Violeta
Wen-Chin Huang
D. Ma
Ryuichi Yamamoto
Kazuhiro Kobayashi
Tomoki Toda
22
3
0
18 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
38
4
0
18 Sep 2023
Augmenting text for spoken language understanding with Large Language
  Models
Augmenting text for spoken language understanding with Large Language Models
Roshan Sharma
Suyoun Kim
Daniel Lazar
Trang Le
Akshat Shrivastava
Kwanghoon Ahn
Piyush Kansal
Leda Sari
Ozlem Kalinli
Michael Seltzer
36
2
0
17 Sep 2023
Enhancing GAN-Based Vocoders with Contrastive Learning Under
  Data-limited Condition
Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition
Haoming Guo
Seth Z. Zhao
Jiachen Lian
Gopala Anumanchipalli
Gerald Friedland
24
2
0
16 Sep 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained
  Generative Methods for Speech Enhancement in Adverse Conditions
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
Huatian Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
37
3
0
16 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with
  Vision-Language Pre-training and Multi-modal Tokens
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
26
6
0
15 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for
  Robust Polyglot Text-To-Speech
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
36
1
0
15 Sep 2023
Fewer-token Neural Speech Codec with Time-invariant Codes
Fewer-token Neural Speech Codec with Time-invariant Codes
Yong Ren
Tao Wang
Jiangyan Yi
Le Xu
Jianhua Tao
Chuyuan Zhang
Jun Zhou
25
33
0
15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic
  and acoustic features
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
26
3
0
15 Sep 2023
VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy
  Research
VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research
Sarina Meyer
Xiaoxiao Miao
Ngoc Thang Vu
42
6
0
14 Sep 2023
EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel
  and In-the-wild Data
EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
N. Prabhu
Bunlong Lay
Simon Welker
N. Lehmann-Willenbrock
Timo Gerkmann
DiffM
21
3
0
14 Sep 2023
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and
  Periodic Inductive Bias
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Sipan Li
Songxiang Liu
Lu Zhang
Xiang Li
Yanyao Bian
Chao Weng
Zhiyong Wu
Helen Meng
41
2
0
14 Sep 2023
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep
  Embeddings
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings
Arnab Das
Suhita Ghosh
Tim Polzehl
Sebastian Stober
40
4
0
14 Sep 2023
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel
  Emotion-Preserving Voice Conversion
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Suhita Ghosh
Arnab Das
Yamini Sinha
Ingo Siegert
Tim Polzehl
Sebastian Stober
27
4
0
14 Sep 2023
Direct Text to Speech Translation System using Acoustic Units
Direct Text to Speech Translation System using Acoustic Units
Victoria Mingote
Pablo Gimeno
Luis Vicente
Sameer Khurana
Antoine Laurent
J. Duret
36
3
0
14 Sep 2023
SpatialCodec: Neural Spatial Speech Coding
SpatialCodec: Neural Spatial Speech Coding
Zhongweiyang Xu
Yong-mei Xu
Vinay Kothapally
Heming Wang
Muqiao Yang
Dong Yu
29
1
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit
  for Neural Speech Codec
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
34
55
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
29
58
0
14 Sep 2023
AudioSR: Versatile Audio Super-resolution at Scale
AudioSR: Versatile Audio Super-resolution at Scale
Haohe Liu
Ke Chen
Qiao Tian
Wenwu Wang
Mark D. Plumbley
DiffM
18
21
0
13 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in
  Speech Waveforms
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
23
3
0
13 Sep 2023
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Xiaoxiao Miao
Xin Eric Wang
Erica Cooper
Junichi Yamagishi
Nicholas W. D. Evans
Massimiliano Todisco
J. Bonastre
Mickael Rouvier
22
5
0
12 Sep 2023
Can large-scale vocoded spoofed data improve speech spoofing
  countermeasure with a self-supervised front end?
Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Xin Wang
Junichi Yamagishi
SyDa
58
23
0
12 Sep 2023
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of
  SSWP
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Jinzuomu Zhong
Yang Li
Hui Huang
Korin Richmond
Jie Liu
Zhiba Su
Jing Guo
Benlai Tang
Fengjie Zhu
23
1
0
11 Sep 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Yiwei Guo
Chenpeng Du
Ziyang Ma
Xie Chen
K. Yu
DiffM
35
37
0
10 Sep 2023
Exploring Domain-Specific Enhancements for a Neural Foley Synthesizer
Exploring Domain-Specific Enhancements for a Neural Foley Synthesizer
Ashwin Pillay
Sage J. Betko
Ari Liloia
Hao Chen
Ankit Shah
14
0
0
08 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech Generation
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
29
2
0
08 Sep 2023
A Two-Stage Training Framework for Joint Speech Compression and
  Enhancement
A Two-Stage Training Framework for Joint Speech Compression and Enhancement
Jiayi Huang
Zeyu Yan
Wenbin Jiang
Fei Wen
32
0
0
08 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
Matcha-TTS: A fast TTS architecture with conditional flow matching
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
29
73
0
06 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial
  Network
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
23
11
0
06 Sep 2023
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in
  Music Audio Signals
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals
Yiming Wu
CoGe
DRL
41
0
0
06 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge
  2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
23
2
0
06 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any
  Voice Conversion using Only Speech Data
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
28
2
0
06 Sep 2023
FSD: An Initial Chinese Dataset for Fake Song Detection
FSD: An Initial Chinese Dataset for Fake Song Detection
Yuankun Xie
Jingjing Zhou
Xiaolin Lu
Zhenghao Jiang
Yuxin Yang
Haonan Cheng
Long Ye
29
14
0
05 Sep 2023
Timbre-reserved Adversarial Attack in Speaker Identification
Timbre-reserved Adversarial Attack in Speaker Identification
Qing Wang
Jixun Yao
Li Zhang
Pengcheng Guo
Linfu Xie
AAML
37
4
0
02 Sep 2023
Previous
123...101112...212223
Next