ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05646
  4. Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
v1v2 (latest)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
ArXiv (abs)PDFHTML

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,154 papers shown
Title
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
Hyun Joon Park
Jeongmin Liu
Jin Sob Kim
Jeong Yeol Yang
Sung Won Han
Eunwoo Song
30
0
0
20 Jun 2025
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement
Tuan-Nam Nguyen
Ngoc-Quan Pham
Seymanur Akti
Alexander Waibel
36
0
0
19 Jun 2025
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
Yunkee Chae
Kyogu Lee
36
0
0
19 Jun 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
62
0
0
18 Jun 2025
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
Shufan Li
Aditya Grover
21
0
0
18 Jun 2025
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen
Takuya Higuchi
Zakaria Aldeneh
Ahmed Hussen Abdelaziz
Alexander I. Rudnicky
41
0
0
17 Jun 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild
Instance-Specific Test-Time Training for Speech Editing in the Wild
Taewoo Kim
Uijong Lee
H. Park
Choongsang Cho
Nam In Park
Young Han Lee
29
0
0
16 Jun 2025
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Shaolei Zhang
Shoutao Guo
Qingkai Fang
Yan Zhou
Yang Feng
MLLMAuLLMVLM
75
0
0
16 Jun 2025
Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling
Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling
Xiaodan Chen
Xiaoxue Gao
M. Quoy
Alexandre Pitti
Nancy F.Chen
39
0
0
13 Jun 2025
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Yuki Ito
Hassan Shahmohammadi
Siddhant Arora
Shinji Watanabe
AuLLM
119
0
0
12 Jun 2025
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
Yisi Liu
Chenyang Wang
Hanjo Kim
Raniya Khan
Gopala Anumanchipalli
124
0
0
12 Jun 2025
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
Chao-Hong Tan
Qian Chen
Wen Wang
Chong Deng
Qinglin Zhang
...
Yukun Ma
Yafeng Chen
Hui Wang
Jiaqing Liu
Jieping Ye
AuLLM
91
0
0
11 Jun 2025
Training-Free Voice Conversion with Factorized Optimal Transport
Alexander Lobashev
Assel Yermekova
Maria Larchenko
72
0
0
11 Jun 2025
A Review on Score-based Generative Models for Audio Applications
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffMMedIm
48
0
0
10 Jun 2025
Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
27
0
0
10 Jun 2025
Spectral Domain Neural Reconstruction for Passband FMCW Radars
Harshvardhan Takawale
Nirupam Roy
24
0
0
09 Jun 2025
Neural Spectral Band Generation for Audio Coding
Neural Spectral Band Generation for Audio Coding
Woongjib Choi
Byeong Hyeon Kim
Hyungseob Lim
Inseon Jang
Hong-Goo Kang
36
0
0
07 Jun 2025
Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages
Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages
Utkarsh Pathak
Chandra Sai Krishna Gunda
Anusha Prakash
Keshav Agarwal
Hema A. Murthy
70
0
0
04 Jun 2025
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama
Masaya Kawamura
Ryo Terashima
Ryuichi Yamamoto
Tomoki Toda
134
0
0
04 Jun 2025
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
Masaya Kawamura
Takuya Hasumi
Yuma Shirahata
Ryuichi Yamamoto
MQ
63
0
0
04 Jun 2025
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Seymanur Akti
T. Nguyen
Alexander Waibel
DRL
154
0
0
04 Jun 2025
Conformer-based Ultrasound-to-Speech Conversion
Conformer-based Ultrasound-to-Speech Conversion
Ibrahim Ibrahimov
Zainkó Csaba
Gábor Gosztolya
MedIm
80
0
0
04 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
154
0
0
04 Jun 2025
Trusted Fake Audio Detection Based on Dirichlet Distribution
Trusted Fake Audio Detection Based on Dirichlet Distribution
Chi Ding
Junxiao Xue
Cong Wang
Hao Zhou
62
0
0
03 Jun 2025
SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
Yixuan Hou
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
AuLLM
65
0
0
03 Jun 2025
Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech
Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech
Karl El Hajal
Enno Hermann
Sevada Hovsepyan
Mathew Magimai.-Doss
58
0
0
02 Jun 2025
Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
Mattson Ogg
Caitlyn Bishop
Han Yi
Sarah Robinson
80
0
0
02 Jun 2025
In-the-wild Audio Spatialization with Flexible Text-guided Localization
In-the-wild Audio Spatialization with Flexible Text-guided Localization
Tianrui Pan
Jie Liu
Z. Huang
Jie Tang
Gangshan Wu
66
0
0
01 Jun 2025
PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data
PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data
Songjun Cao
Qinghua Wu
Jie Chen
Jin Li
Long Ma
54
0
0
01 Jun 2025
ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization
ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization
Pengyu Ren
Wenhao Guan
Kaidi Wang
Peijie Chen
Q. Hong
Lin Li
55
0
0
01 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
55
0
0
01 Jun 2025
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song
Jiawei Chen
Xiaobin Zhuang
Chenpeng Du
Ziyang Ma
...
Dongya Jia
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen
45
0
0
31 May 2025
When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds
When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds
Minsu Kang
Seolhee Lee
Choonghyeon Lee
Namhyun Cho
VLM
38
0
0
30 May 2025
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
Jin Wang
Wenbin Jiang
Xiangbo Wang
49
0
0
30 May 2025
Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem
Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem
Andres Fernandez
Juan Azcarreta
Cagdas Bilen
Jesus Monge Alvarez
41
0
0
30 May 2025
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Badr M. Abdullah
Matthew Baas
Bernd Möbius
Dietrich Klakow
28
0
0
30 May 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
Charlotte Pouw
Afra Alishahi
Willem H. Zuidema
35
0
0
28 May 2025
Study of Lightweight Transformer Architectures for Single-Channel Speech Enhancement
Study of Lightweight Transformer Architectures for Single-Channel Speech Enhancement
Haixin Zhao
Nilesh Madhu
76
0
0
27 May 2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Jeongsoo Choi
Jaehun Kim
Joon Son Chung
36
0
0
27 May 2025
VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion
VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion
Joon-Seung Choi
Dong-Min Byun
Hyung-Seok Oh
Seong-Whan Lee
88
0
0
27 May 2025
GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
Seokgi Lee
Jungjun Kim
TTA
121
0
0
26 May 2025
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
Anton Firc
Manasi Chibber
Jagabandhu Mishra
Vishwanath Pratap Singh
Tomi Kinnunen
K. Malinka
179
0
0
26 May 2025
SpeakStream: Streaming Text-to-Speech with Interleaved Data
SpeakStream: Streaming Text-to-Speech with Interleaved Data
Richard He Bai
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
AuLLMAI4TS
64
0
0
25 May 2025
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
Zhichao Wu
Yueteng Kang
Songjun Cao
Long Ma
Qiulin Li
Qun Yang
DiffM
62
0
0
24 May 2025
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
Soumya Dutta
Avni Jain
Sriram Ganapathy
130
0
0
23 May 2025
Private kNN-VC: Interpretable Anonymization of Converted Speech
Private kNN-VC: Interpretable Anonymization of Converted Speech
Carlos Franzreb
Arnab Das
Tim Polzehl
Sebastian Möller
35
0
0
23 May 2025
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Chi-Yuan Hsiao
Ke-Han Lu
Kai-Wei Chang
Chih-Kai Yang
Wei-Chih Chen
Hung-yi Lee
CLLMoMe
208
0
0
23 May 2025
LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context
LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context
Natsuo Yamashita
Masaaki Yamamoto
Hiroaki Kokubo
Yohei Kawaguchi
41
0
0
23 May 2025
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Zhi-Wei Zhong
Akira Takahashi
Shuyang Cui
Keisuke Toyama
Shusuke Takahashi
Yuki Mitsufuji
VGen
74
0
0
22 May 2025
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
Tianduo Wang
Lu Xu
Wei Lu
Shanbo Cheng
52
0
0
22 May 2025
1234...222324
Next