ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXivPDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,102 papers shown
Title
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from
  Codec-Based Speech Synthesis Systems
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
Haibin Wu
Yuan Tseng
Hung-yi Lee
AuLLM
35
6
0
11 Jun 2024
ICGAN: An implicit conditioning method for interpretable feature control
  of neural audio synthesis
ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis
Yunyi Liu
Craig Jin
39
0
0
11 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
38
6
0
10 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
51
3
0
10 Jun 2024
Learning Fine-Grained Controllability on Speech Generation via Efficient
  Fine-Tuning
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Chung-Ming Chien
Andros Tjandra
Apoorv Vyas
Matt Le
Bowen Shi
Wei-Ning Hsu
32
0
0
10 Jun 2024
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
Hyunjae Cho
Junhyeok Lee
Wonbin Jung
21
0
0
10 Jun 2024
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing
  Voice Synthesis via Classifier-free Diffusion Guidance
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance
Semin Kim
Myeonghun Jeong
Hyeonseung Lee
Minchan Kim
Byoung Jin Choi
Nam Soo Kim
VLM
DiffM
50
1
0
10 Jun 2024
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Yanis Labrak
Adel Moumen
Richard Dufour
Mickael Rouvier
ELM
LM&MA
MedIm
42
0
0
09 Jun 2024
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Bingsong Bai
Fengping Wang
Yingming Gao
Ya Li
54
0
0
09 Jun 2024
Differentiable Time-Varying Linear Prediction in the Context of
  End-to-End Analysis-by-Synthesis
Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis
Chin-Yun Yu
Gyorgy Fazekas
34
1
0
07 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
43
68
0
07 Jun 2024
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
Shuchen Shi
Ruibo Fu
Zhengqi Wen
Jianhua Tao
Tao Wang
...
Xuefei Liu
Yukun Liu
Yongwei Li
Zhiyong Wang
Xiaopeng Wang
32
1
0
07 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
46
7
0
07 Jun 2024
Neural Codec-based Adversarial Sample Detection for Speaker Verification
Neural Codec-based Adversarial Sample Detection for Speaker Verification
Xuanjun Chen
Jiawei Du
Haibin Wu
Jyh-Shing Roger Jang
Hung-yi Lee
40
2
0
07 Jun 2024
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer
Himanshu Maurya
A. Sigurgeirsson
30
0
0
06 Jun 2024
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task
  Learning
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Shaolei Zhang
Qingkai Fang
Shoutao Guo
Zhengrui Ma
Min Zhang
Yang Feng
31
5
0
05 Jun 2024
Dataset-Distillation Generative Model for Speech Emotion Recognition
Dataset-Distillation Generative Model for Speech Emotion Recognition
Fabian Ritter-Gutierrez
Kuan Po Huang
Jeremy H. M Wong
Dianwen Ng
Hung-yi Lee
Nancy F. Chen
Eng Siong Chng
DD
44
0
0
05 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
36
2
0
04 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for
  Noise-Robust Expressive Speech-to-Speech Translation
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
35
1
0
04 Jun 2024
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled
  Singing Voice Deepfake Detection
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Yongyi Zang
Jiatong Shi
You Zhang
Ryuichi Yamamoto
Jionghao Han
...
Shengyuan Xu
Wenxiao Zhao
Jing Guo
T. Toda
Zhiyao Duan
26
10
0
04 Jun 2024
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing
  Conversion
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Ruiqi Li
Rongjie Huang
Yongqi Wang
Zhiqing Hong
Zhou Zhao
47
1
0
04 Jun 2024
An Independence-promoting Loss for Music Generation with Language Models
An Independence-promoting Loss for Music Generation with Language Models
Jean-Marie Lemercier
Simon Rouard
Jade Copet
Yossi Adi
Alexandre Défossez
30
1
0
04 Jun 2024
Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate
  Control
Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control
Ye-Xin Lu
Yang Ai
Zheng-Yan Sheng
Zhen-Hua Ling
23
1
0
04 Jun 2024
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction
  and Waveform Generation
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
Hui-Peng Du
Ye-Xin Lu
Yang Ai
Zhen-Hua Ling
43
3
0
04 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and
  Zero-shot Language Style Control With Decoupled Codec
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Siqi Zheng
Qian Chen
...
Ziyue Jiang
Hai Huang
Xize Cheng
Rongjie Huang
Zhou Zhao
55
8
0
03 Jun 2024
Fill in the Gap! Combining Self-supervised Representation Learning with
  Neural Audio Synthesis for Speech Inpainting
Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting
Ihab Asaad
Maxime Jacquelin
Olivier Perrotin
Laurent Girin
Thomas Hueber
33
0
0
30 May 2024
Reverse the auditory processing pathway: Coarse-to-fine audio
  reconstruction from fMRI
Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI
Che Liu
Changde Du
Xiaoyu Chen
Huiguang He
38
2
0
29 May 2024
RSET: Remapping-based Sorting Method for Emotion Transfer Speech
  Synthesis
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Haoxiang Shi
Jianzong Wang
Xulong Zhang
Ning Cheng
Jun Yu
Jing Xiao
41
2
0
27 May 2024
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
41
0
0
24 May 2024
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Shiqi Yang
Zhi-Wei Zhong
Mengjie Zhao
Shusuke Takahashi
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
43
3
0
23 May 2024
Survey on Visual Signal Coding and Processing with Generative Models:
  Technologies, Standards and Optimization
Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization
Zhibo Chen
Heming Sun
Li Zhang
Fan Zhang
40
3
0
23 May 2024
MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling
MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling
Diwei Huang
Kun-Li Channing Lin
Peihao Chen
Qing Du
Mingkui Tan
VGen
42
0
0
22 May 2024
A Versatile Diffusion Transformer with Mixture of Noise Levels for
  Audiovisual Generation
A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Gwanghyun Kim
Alonso Martinez
Yu-Chuan Su
Brendan Jou
José Lezama
...
Lijun Yu
Lu Jiang
A. Jansen
Jacob Walker
Krishna Somandepalli
32
8
0
22 May 2024
DiffNorm: Self-Supervised Normalization for Non-autoregressive
  Speech-to-speech Translation
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation
Weiting Tan
Jingyu Zhang
Lingfeng Shen
Daniel Khashabi
Philipp Koehn
32
0
0
22 May 2024
Non-autoregressive real-time Accent Conversion model with voice cloning
Non-autoregressive real-time Accent Conversion model with voice cloning
Vladimir Nechaev
Sergey Kosyakov
42
1
0
21 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
50
9
0
20 May 2024
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang
Ji-Hoon Kim
Junseok Ahn
Doyeop Kwak
Hong-Sun Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
CVBM
36
9
0
16 May 2024
Robust Singing Voice Transcription Serves Synthesis
Robust Singing Voice Transcription Serves Synthesis
Ruiqi Li
Yu Zhang
Yongqi Wang
Zhiqing Hong
Rongjie Huang
Zhou Zhao
40
7
0
16 May 2024
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
Sho Inoue
Kun Zhou
Shuai Wang
Haizhou Li
36
8
0
15 May 2024
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment
  Generation
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
Jianyi Chen
Wei Xue
Xu Tan
Zhen Ye
Qi-fei Liu
Yi-Ting Guo
50
2
0
13 May 2024
Diff-ETS: Learning a Diffusion Probabilistic Model for
  Electromyography-to-Speech Conversion
Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion
Zhao Ren
Kevin Scheck
Qinhan Hou
Stefano van Gogh
Michael Wand
Tanja Schultz
DiffM
41
1
0
11 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and
  Duration via Flow-based Large Diffusion Transformers
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
37
84
0
09 May 2024
AFEN: Respiratory Disease Classification using Ensemble Learning
AFEN: Respiratory Disease Classification using Ensemble Learning
Rahul Nadkarni
Emmanouil Nikolakakis
Razvan Marinescu
16
0
0
08 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
47
15
0
08 May 2024
HILCodec: High Fidelity and Lightweight Neural Audio Codec
HILCodec: High Fidelity and Lightweight Neural Audio Codec
S. Ahn
Beom Jun Woo
Mingrui Han
Chanyeong Moon
Nam Soo Kim
34
6
0
08 May 2024
SingIt! Singer Voice Transformation
SingIt! Singer Voice Transformation
Amit Eliav
Aaron Taub
Renana Opochinsky
Sharon Gannot
29
0
0
07 May 2024
Detecting music deepfakes is easy but actually hard
Detecting music deepfakes is easy but actually hard
Darius Afchar
Gabriel Meseguer-Brocal
Romain Hennequin
63
6
0
07 May 2024
Learning Expressive Disentangled Speech Representations with Soft Speech
  Units and Adversarial Style Augmentation
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
36
0
0
01 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
40
18
0
30 Apr 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
46
4
0
30 Apr 2024
Previous
123...678...212223
Next