ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXivPDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,107 papers shown
Title
Context-aware Coherent Speaking Style Prediction with Hierarchical
  Transformers for Audiobook Speech Synthesis
Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
43
6
0
13 Apr 2023
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Jiatong Shi
Yun Tang
Ann Lee
Hirofumi Inaguma
Changhan Wang
J. Pino
Shinji Watanabe
51
9
0
10 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
Hirofumi Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
24
20
0
10 Apr 2023
ArmanTTS single-speaker Persian dataset
ArmanTTS single-speaker Persian dataset
Mohammd Hasan Shamgholi
Vahid Saeedi
J. Peymanfard
Leila Alhabib
Hossein Zeinali
24
2
0
07 Apr 2023
On the Impact of Voice Anonymization on Speech Diagnostic Applications:
  a Case Study on COVID-19 Detection
On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection
Yi Zhu
Mohamed Imoussaïne-Aïkous
Carolyn Côté-Lussier
Tiago H. Falk
20
4
0
05 Apr 2023
AUDIT: Audio Editing by Following Instructions with Latent Diffusion
  Models
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
Yuancheng Wang
Zeqian Ju
Xuejiao Tan
Lei He
Zhizheng Wu
Jiang Bian
Sheng Zhao
DiffM
19
47
0
03 Apr 2023
Sounding Video Generator: A Unified Framework for Text-guided Sounding
  Video Generation
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Jiawei Liu
Weining Wang
Sihan Chen
Xinxin Zhu
Jiaheng Liu
DiffM
VGen
28
13
0
29 Mar 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for
  Generative Adversarial Network-Based Speech Synthesis
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
34
9
0
24 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance
  body-conducted speech capture
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
21
6
0
17 Mar 2023
PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing
  Pronunciation with Phoneme Distribution Predictor
PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor
Yuning Wu
Jiatong Shi
Tao Qian
Dongji Gao
Qin Jin
33
5
0
15 Mar 2023
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Suhee Jo
Younggun Lee
Yookyung Shin
Yeongtae Hwang
Taesu Kim
15
3
0
15 Mar 2023
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis
Haobin Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
22
14
0
14 Mar 2023
Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement
  Challenge
Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge
Mingshuai Liu
Shubo Lv
Zihan Zhang
Ru Han
Xiang Hao
Xianjun Xia
Li Chen
Yijian Xiao
Linfu Xie
21
6
0
14 Mar 2023
An End-to-End Neural Network for Image-to-Audio Transformation
An End-to-End Neural Network for Image-to-Audio Transformation
Liu Chen
Michael Deisher
Munir Georges
26
3
0
10 Mar 2023
Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical
  Text Reports
Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports
Hyunseung Chung
Jiho Kim
Joon-Myoung Kwon
K. Jeon
Min Sung Lee
Edward Choi
MedIm
11
15
0
09 Mar 2023
Do Prosody Transfer Models Transfer Prosody?
Do Prosody Transfer Models Transfer Prosody?
A. Sigurgeirsson
Simon King
DiffM
17
7
0
07 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
38
173
0
07 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
36
7
0
06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read
  and Spontaneous TTS
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
41
4
0
05 Mar 2023
A General Framework for Learning Procedural Audio Models of
  Environmental Sounds
A General Framework for Learning Procedural Audio Models of Environmental Sounds
Danzel Serrano
M. Cartwright
DiffM
DRL
35
1
0
04 Mar 2023
Spectrogram Inversion for Audio Source Separation via Consistency,
  Mixing, and Magnitude Constraints
Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints
P. Magron
Tuomas Virtanen
32
0
0
03 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model
An investigation into the adaptability of a diffusion-based TTS model
Haolin Chen
Philip N. Garner
DiffM
39
1
0
03 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised
  Speech and Text Representations
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
39
22
0
03 Mar 2023
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for
  Whisper-based Speech Interactions
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
Jun Rekimoto
48
19
0
03 Mar 2023
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
  Benchmark for Speech Understanding
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
Yingting Li
Ambuj Mehrish
Shuaijiang Zhao
Rishabh Bhardwaj
Amir Zadeh
Navonil Majumder
Rada Mihalcea
Soujanya Poria
AAML
29
16
0
02 Mar 2023
Synthetic Cross-accent Data Augmentation for Automatic Speech
  Recognition
Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition
P. Klumpp
Pooja Chitkara
Leda Sari
Prashant Serai
Jilong Wu
Irina-Elena Veliche
Rongqing Huang
Qing He
35
2
0
01 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
25
8
0
01 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Ajinkya Kulkarni
Atharva Kulkarni
Sara Shatnawi
Hanan Aldarmaki
27
8
0
28 Feb 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
18
6
0
28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
45
27
0
27 Feb 2023
Duration-aware pause insertion using pre-trained language model for
  multi-speaker text-to-speech
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Ke Wang
Tomoki Koriyama
Yuki Saito
Takaaki Saeki
Detai Xin
Hiroshi Saruwatari
28
7
0
27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using
  Variance Information via Normalizing Flow
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
28
6
0
27 Feb 2023
Contrast-PLC: Contrastive Learning for Packet Loss Concealment
Contrast-PLC: Contrastive Learning for Packet Loss Concealment
Huaying Xue
Xiulian Peng
Yan Lu
54
4
0
26 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
22
3
0
24 Feb 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier
  Transform for Faster Conversion
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Houjian Guo
Chaoran Liu
C. Ishi
H. Ishiguro
BDL
27
12
0
16 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly
  Disentangled Self-supervised Speech Representations
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
13
21
0
16 Feb 2023
Speaker-Independent Acoustic-to-Articulatory Speech Inversion
Speaker-Independent Acoustic-to-Articulatory Speech Inversion
Peter Wu
Li-Wei Chen
Cheol Jun Cho
Shinji Watanabe
Louis Goldstein
A. Black
Gopala K. Anumanchipalli
18
26
0
14 Feb 2023
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech
  synthesis in Indian languages
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages
Sudhanshu Srivastava
Ishika Gupta
Anusha Prakash
Jom Kuriakose
H. Murthy
VLM
21
1
0
13 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World
  Spontaneous Speech
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
19
35
0
08 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal
  Supervision
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Eugene Kharitonov
Damien Vincent
Zalan Borsos
Raphaël Marinier
Sertan Girgin
Olivier Pietquin
Matthew Sharifi
Marco Tagliasacchi
Neil Zeghidour
24
191
0
07 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
31
85
0
31 Jan 2023
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear
  Layer
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer
Yuhta Takida
Masaaki Imaizumi
Takashi Shibuya
Chieh-Hsin Lai
Toshimitsu Uesaka
Naoki Murata
Yuki Mitsufuji
GAN
26
12
0
30 Jan 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
151
318
0
30 Jan 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
37
18
0
30 Jan 2023
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo Mandic
Wenwu Wang
Mark D. Plumbley
DiffM
49
473
0
29 Jan 2023
Time out of Mind: Generating Rate of Speech conditioned on emotion and
  speaker
Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Navjot Kaur
Paige Tuttosi
29
2
0
29 Jan 2023
Open Problems in Applied Deep Learning
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
55
2
0
26 Jan 2023
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving
  Source Separation
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation
Shahar Lutati
Eliya Nachmani
Lior Wolf
DiffM
40
14
0
25 Jan 2023
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for
  Expressive Speech-to-Speech Translation
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
Wen-Chin Huang
Benjamin Peloquin
Justine T. Kao
Changhan Wang
Hongyu Gong
Elizabeth Salesky
Yossi Adi
Ann Lee
Peng-Jen Chen
25
16
0
25 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation
Regeneration Learning: A Learning Paradigm for Data Generation
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Yoshua Bengio
GAN
40
15
0
21 Jan 2023
Previous
123...141516...212223
Next