ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXivPDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,107 papers shown
Title
PerMod: Perceptually Grounded Voice Modification with Latent Diffusion
  Models
PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models
Robin Netzorg
A. Jalal
Luna McNulty
Gopala Krishna Anumanchipalli
DiffM
33
1
0
13 Dec 2023
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech
  Synthesis achieving both Auditory and Photo-realism
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism
Georgios Milis
P. Filntisis
A. Roussos
Petros Maragos
CVBM
41
2
0
11 Dec 2023
A Representative Study on Human Detection of Artificially Generated
  Media Across Countries
A Representative Study on Human Detection of Artificially Generated Media Across Countries
Joel Frank
Franziska Herbert
Jonas Ricker
Lea Schonherr
Thorsten Eisenhofer
Asja Fischer
Markus Dürmuth
Thorsten Holz
43
13
0
10 Dec 2023
CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional
  Modeling
CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling
Ruihan Yang
H. Gamper
Sebastian Braun
DiffM
36
5
0
08 Dec 2023
Neural Concatenative Singing Voice Conversion: Rethinking
  Concatenation-Based Approach for One-Shot Singing Voice Conversion
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion
Binzhu Sha
Xu Li
Zhiyong Wu
Yin Shan
Helen M. Meng
23
7
0
08 Dec 2023
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Zehua Chen
Guande He
Kaiwen Zheng
Xu Tan
Jun Zhu
DiffM
56
21
0
06 Dec 2023
Detecting Voice Cloning Attacks via Timbre Watermarking
Detecting Voice Cloning Attacks via Timbre Watermarking
Chang-rui Liu
Jie Zhang
Tianwei Zhang
Xi Yang
Weiming Zhang
Neng H. Yu
33
29
0
06 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
39
12
0
05 Dec 2023
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue
  State Tracking
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Jihyun Lee
Yejin Jeon
Wonjun Lee
Yunsu Kim
Gary Geunbae Lee
17
1
0
04 Dec 2023
OpenVoice: Versatile Instant Voice Cloning
OpenVoice: Versatile Instant Voice Cloning
Zengyi Qin
Wenliang Zhao
Xumin Yu
Xin Sun
VLM
37
20
0
03 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using
  Synthetic Data and Transfer learning
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
38
0
0
02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Raviraj Joshi
Nikesh Garera
40
0
0
02 Dec 2023
AV-RIR: Audio-Visual Room Impulse Response Estimation
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah
Sreyan Ghosh
Sonal Kumar
Purva Chiniya
Dinesh Manocha
46
14
0
30 Nov 2023
Compression of end-to-end non-autoregressive image-to-speech system for
  low-resourced devices
Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices
Gokul Srinivasagan
Michael Deisher
Munir Georges
VLM
27
0
0
30 Nov 2023
Vulnerability of Automatic Identity Recognition to Audio-Visual
  Deepfakes
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes
Pavel Korshunov
Haolin Chen
Philip N. Garner
S´ebastien Marcel
CVBM
56
4
0
29 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
35
31
0
21 Nov 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech
  Synthesis
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
25
1
0
20 Nov 2023
A Study on Altering the Latent Space of Pretrained Text to Speech Models
  for Improved Expressiveness
A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness
Mathias Vogel
DiffM
45
0
0
17 Nov 2023
SponTTS: modeling and transferring spontaneous style for TTS
SponTTS: modeling and transferring spontaneous style for TTS
Hanzhao Li
Xinfa Zhu
Liumeng Xue
Yang Song
Yunlin Chen
Lei Xie
48
7
0
13 Nov 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust
  Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
34
24
0
08 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer
  Normalization based Diffusion GAN
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
34
0
0
27 Oct 2023
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct
  Speech-to-Speech Translation
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
Yongxin Zhu
Zhujin Gao
Xinyuan Zhou
Zhongyi Ye
Linli Xu
34
2
0
26 Oct 2023
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
28
2
0
26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
32
6
0
26 Oct 2023
Back Transcription as a Method for Evaluating Robustness of Natural
  Language Understanding Models to Speech Recognition Errors
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Marek Kubis
Pawel Skórzewski
Marcin Sowañski
Tomasz Ziętkiewicz
21
6
0
25 Oct 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
27
31
0
25 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal
  point processes
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
59
0
0
23 Oct 2023
Audio Editing with Non-Rigid Text Prompts
Audio Editing with Non-Rigid Text Prompts
Francesco Paissan
Luca Della Libera
Zhepei Wang
Mirco Ravanelli
Paris Smaragdis
Cem Subakan
DiffM
46
5
0
19 Oct 2023
Black-Box Training Data Identification in GANs via Detector Networks
Black-Box Training Data Identification in GANs via Detector Networks
Lukman Olagoke
Salil P. Vadhan
Seth Neel
31
0
0
18 Oct 2023
A High Fidelity and Low Complexity Neural Audio Coding
A High Fidelity and Low Complexity Neural Audio Coding
Wenzhe Liu
Wei Xiao
Meng Wang
Shan Yang
Yupeng Shi
Yuyong Kang
Dan Su
Shidong Shang
Dong Yu
22
2
0
17 Oct 2023
Generation or Replication: Auscultating Audio Latent Diffusion Models
Generation or Replication: Auscultating Audio Latent Diffusion Models
Dimitrios Bralios
Gordon Wichern
François Germain
Zexu Pan
Sameer Khurana
Chiori Hori
Jonathan Le Roux
DiffM
27
6
0
16 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self
  Transformations
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
23
3
0
14 Oct 2023
Generative Adversarial Training for Text-to-Speech Synthesis Based on
  Raw Phonetic Input and Explicit Prosody Modelling
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling
Tiberiu Boros
Stefan Daniel Dumitrescu
Ionut Mironica
Radu Chivereanu
GAN
22
1
0
14 Oct 2023
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and
  Textually Described Voices
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Matthew Baas
Herman Kamper
25
3
0
12 Oct 2023
DASpeech: Directed Acyclic Transformer for Fast and High-quality
  Speech-to-Speech Translation
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Qingkai Fang
Yan Zhou
Yangzhou Feng
45
7
0
11 Oct 2023
Enhancing expressivity transfer in textless speech-to-speech translation
Enhancing expressivity transfer in textless speech-to-speech translation
J. Duret
Benjamin O’Brien
Yannick Esteve
Titouan Parcollet
51
2
0
11 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech
  generation
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
45
16
0
11 Oct 2023
AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual
  Voice Conversion
AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice Conversion
Haeyun Choi
Jio Gim
Yuho Lee
Youngin Kim
Young-Joo Suh
BDL
29
1
0
10 Oct 2023
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and
  Nonverbal Expressions
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Detai Xin
Junfeng Jiang
Shinnosuke Takamichi
Yuki Saito
Akiko Aizawa
Hiroshi Saruwatari
27
11
0
09 Oct 2023
An Initial Investigation of Neural Replay Simulator for Over-the-Air
  Adversarial Perturbations to Automatic Speaker Verification
An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification
Jiaqi Li
Li Wang
Liumeng Xue
Lei Wang
Zhizheng Wu
AAML
43
3
0
09 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio
  tokens
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
46
13
0
08 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech
  and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge
  2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
23
7
0
08 Oct 2023
Unified speech and gesture synthesis using flow matching
Unified speech and gesture synthesis using flow matching
Shivam Mehta
Ruibo Tu
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
55
3
0
08 Oct 2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing
  for SVCC2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023
Yi-Hua Zhou
Meng Chen
Yi Lei
Jihua Zhu
Weifeng Zhao
26
5
0
08 Oct 2023
SALT: Distinguishable Speaker Anonymization Through Latent Space
  Transformation
SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation
Yuanjun Lv
Jixun Yao
Peikun Chen
Hongbin Zhou
Heng Lu
Lei Xie
33
5
0
08 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning
  Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
24
1
0
08 Oct 2023
VoiceExtender: Short-utterance Text-independent Speaker Verification
  with Guided Diffusion Model
VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model
Yayun He
Zuheng Kang
Jianzong Wang
Junqing Peng
Jing Xiao
DiffM
27
2
0
07 Oct 2023
Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech
  Reconstruction
Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction
Jiawei Li
Chunxu Guo
Li Fu
Lu Fan
Edward F. Chang
Yuanning Li
14
2
0
07 Oct 2023
DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction
DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction
Jiarui Hai
Helin Wang
Dongchao Yang
Karan Thakkar
Najim Dehak
Mounya Elhilali
DiffM
31
7
0
06 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling
  for Zero-Shot Voice Cloning
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
37
3
0
06 Oct 2023
Previous
123...91011...212223
Next