ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXivPDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,107 papers shown
Title
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme
  Predictions
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
30
22
0
20 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice
  Conversion
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Hao Liu
Tao Wang
Ruibo Fu
Jiangyan Yi
Zhengqi Wen
J. Tao
23
3
0
10 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
48
654
0
05 Jan 2023
Towards Voice Reconstruction from EEG during Imagined Speech
Towards Voice Reconstruction from EEG during Imagined Speech
Young-Eun Lee
Seo-Hyun Lee
Sang-Ho Kim
Seong-Whan Lee
32
35
0
02 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to
  Speech
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
39
22
0
30 Dec 2022
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
29
18
0
29 Dec 2022
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech
  Enhancement and Dereverberation
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
Jean-Marie Lemercier
Julius Richter
Simon Welker
Timo Gerkmann
DiffM
159
81
0
22 Dec 2022
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
  Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu
Tal Remez
Bowen Shi
Jacob Donley
Yossi Adi
DiffM
27
12
0
21 Dec 2022
Speaking Style Conversion in the Waveform Domain Using Discrete
  Self-Supervised Units
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units
Gallil Maimon
Yossi Adi
39
13
0
19 Dec 2022
Text-to-speech synthesis based on latent variable conversion using
  diffusion probabilistic model and variational autoencoder
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Yusuke Yasuda
Tomoki Toda
DiffM
22
7
0
16 Dec 2022
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Hirofumi Inaguma
Sravya Popuri
Ilia Kulikov
Peng-Jen Chen
Changhan Wang
Yu-An Chung
Yun Tang
Ann Lee
Shinji Watanabe
J. Pino
55
53
0
15 Dec 2022
RWEN-TTS: Relation-aware Word Encoding Network for Natural
  Text-to-Speech Synthesis
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh
HyeongRae Noh
Yoonseok Hong
Insoo Oh
25
0
0
15 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
  Dataset
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
41
1
0
11 Dec 2022
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with
  Very Low Computational Complexity
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
Ahmed Mustafa
J. Valin
Jan Büthe
Paris Smaragdis
Mike Goodwin
30
4
0
08 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
37
15
0
08 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming-Hsuan Yang
Qin Huang
80
26
0
08 Dec 2022
Breaking the Spurious Causality of Conditional Generation via Fairness
  Intervention with Corrective Sampling
Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling
J. Nam
Sangwoo Mo
Jaeho Lee
Jinwoo Shin
29
7
0
05 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice
  Synthesis
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Qicong Xie
Jixun Yao
Linfu Xie
Dan Su
DiffM
23
8
0
03 Dec 2022
Deep neural network techniques for monaural speech enhancement: state of
  the art analysis
Deep neural network techniques for monaural speech enhancement: state of the art analysis
P. Ochieng
40
21
0
01 Dec 2022
Hiding speaker's sex in speech using zero-evidence speaker
  representation in an analysis/synthesis pipeline
Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline
Paul-Gauthier Noé
Xiaoxiao Miao
Xin Wang
Junichi Yamagishi
J. Bonastre
D. Matrouf
23
7
0
29 Nov 2022
Neural Vocoder Feature Estimation for Dry Singing Voice Separation
Neural Vocoder Feature Estimation for Dry Singing Voice Separation
Jae-Yeol Im
Soonbeom Choi
Sangeon Yong
Juhan Nam
32
1
0
29 Nov 2022
Contextual Expressive Text-to-Speech
Contextual Expressive Text-to-Speech
Jianhong Tu
Zeyu Cui
Xiaohuan Zhou
Siqi Zheng
Kaiqin Hu
Ju Fan
Chang Zhou
22
2
0
26 Nov 2022
Puffin: pitch-synchronous neural waveform generation for fullband speech
  on modest devices
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
O. Watts
Lovisa Wihlborg
Cassia Valentini-Botinhao
38
3
0
25 Nov 2022
Efficient Incremental Text-to-Speech on GPUs
Efficient Incremental Text-to-Speech on GPUs
Muyang Du
Chuan Liu
Jiaxing Qi
Junjie Lai
26
1
0
25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural
  MIDI-to-Audio Synthesis Systems?
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Xuan Shi
Erica Cooper
Xin Wang
Junichi Yamagishi
Shrikanth Narayanan
27
1
0
25 Nov 2022
Voice-preserving Zero-shot Multiple Accent Conversion
Voice-preserving Zero-shot Multiple Accent Conversion
Mumin Jin
Prashant Serai
Jilong Wu
Andros Tjandra
Vimal Manohar
Qing He
19
12
0
23 Nov 2022
PromptTTS: Controllable Text-to-Speech with Text Descriptions
PromptTTS: Controllable Text-to-Speech with Text Descriptions
Zhifang Guo
Yichong Leng
Yihan Wu
Sheng Zhao
Xuejiao Tan
DiffM
27
91
0
22 Nov 2022
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural
  Speech Synthesis System
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Takenori Yoshimura
Shinji Takaki
Kazuhiro Nakamura
Keiichiro Oura
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
37
7
0
21 Nov 2022
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
Rodrigo Mira
Buye Xu
Jacob Donley
Anurag Kumar
Stavros Petridis
V. Ithapu
Maja Pantic
28
13
0
20 Nov 2022
VarietySound: Timbre-Controllable Video to Sound Generation via
  Unsupervised Information Disentanglement
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
40
14
0
19 Nov 2022
Multi-Speaker Expressive Speech Synthesis via Multiple Factors
  Decoupling
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling
Xinfa Zhu
Yinjiao Lei
Kun Song
Yongmao Zhang
Tao Li
Linfu Xie
21
17
0
19 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
43
18
0
17 Nov 2022
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label
  Guidance
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
Yiwei Guo
Chenpeng Du
Xie Chen
K. Yu
DiffM
67
40
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
20
46
0
17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with
  Diffusion Models
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
Minki Kang
Dong Min
Sung Ju Hwang
DiffM
25
48
0
17 Nov 2022
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method
  Using Variational Autoencoder and Adversarial Training
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Yang Xiang
Jesper Lisby Højvang
M. Rasmussen
M. G. Christensen
DRL
25
5
0
16 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style
  Transfer
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
21
1
0
16 Nov 2022
Improved disentangled speech representations using contrastive learning
  in factorized hierarchical variational autoencoder
Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Yuying Xie
Thomas Arildsen
Zheng-Hua Tan
26
2
0
15 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech
  Representation using Differentiable Digital Signal Processing
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
J. Webber
Cassia Valentini-Botinhao
Evelyn Williams
G. Henter
Simon King
16
9
0
13 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
41
12
0
13 Nov 2022
A unified one-shot prosody and speaker conversion system with
  self-supervised discrete speech units
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
32
6
0
12 Nov 2022
Semi-supervised learning for continuous emotional intensity controllable
  speech synthesis with disentangled representations
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations
Yoorim Oh
Juheon Lee
Yoseob Han
Kyogu Lee
28
3
0
11 Nov 2022
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant
  Instance Conditioning
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning
Gaku Narita
Junichi Shimizu
Taketo Akama
GAN
34
11
0
10 Nov 2022
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion
  of Bottleneck and Perturbation Features
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features
Ziqian Ning
Qicong Xie
Pengcheng Zhu
Zhichao Wang
Liumeng Xue
Jixun Yao
Linfu Xie
Mengxiao Bi
37
16
0
09 Nov 2022
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate
  One-to-Many Mapping
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
Junhyeok Lee
Seungu Han
Hyunjae Cho
Wonbin Jung
27
11
0
08 Nov 2022
Distinguishable Speaker Anonymization based on Formant and Fundamental
  Frequency Scaling
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling
Jixun Yao
Qing Wang
Yi Lei
Pengcheng Guo
Linfu Xie
Namin Wang
Jie Liu
40
14
0
06 Nov 2022
Preserving background sound in noise-robust voice conversion via
  multi-task learning
Preserving background sound in noise-robust voice conversion via multi-task learning
Jixun Yao
Yi Lei
Qing Wang
Pengcheng Guo
Ziqian Ning
Linfu Xie
Hai Li
Junhui Liu
Danming Xie
44
10
0
06 Nov 2022
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by
  Digital Signal Processing Synthesizer
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Yongmao Zhang
Heyang Xue
Hanzhao Li
Linfu Xie
Tingwei Guo
Ruixiong Zhang
Caixia Gong
DiffM
VLM
29
28
0
05 Nov 2022
Self-Supervised Learning for Speech Enhancement through Synthesis
Self-Supervised Learning for Speech Enhancement through Synthesis
Bryce Irvin
Marko Stamenovic
M. Kegler
Li-Chia Yang
43
18
0
04 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for
  Noise-robust Expressive TTS
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang
Songxiang Liu
Jianwei Yu
Helin Wang
Chao Weng
Yuexian Zou
DiffM
VLM
43
18
0
04 Nov 2022
Previous
123...151617...212223
Next