Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
v1
v2 (latest)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,154 papers shown
Title
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
Hyun Joon Park
Jeongmin Liu
Jin Sob Kim
Jeong Yeol Yang
Sung Won Han
Eunwoo Song
30
0
0
20 Jun 2025
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement
Tuan-Nam Nguyen
Ngoc-Quan Pham
Seymanur Akti
Alexander Waibel
36
0
0
19 Jun 2025
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
Yunkee Chae
Kyogu Lee
36
0
0
19 Jun 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
62
0
0
18 Jun 2025
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
Shufan Li
Aditya Grover
21
0
0
18 Jun 2025
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen
Takuya Higuchi
Zakaria Aldeneh
Ahmed Hussen Abdelaziz
Alexander I. Rudnicky
41
0
0
17 Jun 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild
Taewoo Kim
Uijong Lee
H. Park
Choongsang Cho
Nam In Park
Young Han Lee
29
0
0
16 Jun 2025
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Shaolei Zhang
Shoutao Guo
Qingkai Fang
Yan Zhou
Yang Feng
MLLM
AuLLM
VLM
75
0
0
16 Jun 2025
Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling
Xiaodan Chen
Xiaoxue Gao
M. Quoy
Alexandre Pitti
Nancy F.Chen
39
0
0
13 Jun 2025
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Yuki Ito
Hassan Shahmohammadi
Siddhant Arora
Shinji Watanabe
AuLLM
119
0
0
12 Jun 2025
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
Yisi Liu
Chenyang Wang
Hanjo Kim
Raniya Khan
Gopala Anumanchipalli
124
0
0
12 Jun 2025
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
Chao-Hong Tan
Qian Chen
Wen Wang
Chong Deng
Qinglin Zhang
...
Yukun Ma
Yafeng Chen
Hui Wang
Jiaqing Liu
Jieping Ye
AuLLM
91
0
0
11 Jun 2025
Training-Free Voice Conversion with Factorized Optimal Transport
Alexander Lobashev
Assel Yermekova
Maria Larchenko
72
0
0
11 Jun 2025
A Review on Score-based Generative Models for Audio Applications
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffM
MedIm
48
0
0
10 Jun 2025
Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
27
0
0
10 Jun 2025
Spectral Domain Neural Reconstruction for Passband FMCW Radars
Harshvardhan Takawale
Nirupam Roy
24
0
0
09 Jun 2025
Neural Spectral Band Generation for Audio Coding
Woongjib Choi
Byeong Hyeon Kim
Hyungseob Lim
Inseon Jang
Hong-Goo Kang
36
0
0
07 Jun 2025
Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages
Utkarsh Pathak
Chandra Sai Krishna Gunda
Anusha Prakash
Keshav Agarwal
Hema A. Murthy
70
0
0
04 Jun 2025
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama
Masaya Kawamura
Ryo Terashima
Ryuichi Yamamoto
Tomoki Toda
134
0
0
04 Jun 2025
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
Masaya Kawamura
Takuya Hasumi
Yuma Shirahata
Ryuichi Yamamoto
MQ
63
0
0
04 Jun 2025
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Seymanur Akti
T. Nguyen
Alexander Waibel
DRL
154
0
0
04 Jun 2025
Conformer-based Ultrasound-to-Speech Conversion
Ibrahim Ibrahimov
Zainkó Csaba
Gábor Gosztolya
MedIm
80
0
0
04 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
154
0
0
04 Jun 2025
Trusted Fake Audio Detection Based on Dirichlet Distribution
Chi Ding
Junxiao Xue
Cong Wang
Hao Zhou
62
0
0
03 Jun 2025
SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
Yixuan Hou
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
AuLLM
65
0
0
03 Jun 2025
Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech
Karl El Hajal
Enno Hermann
Sevada Hovsepyan
Mathew Magimai.-Doss
58
0
0
02 Jun 2025
Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
Mattson Ogg
Caitlyn Bishop
Han Yi
Sarah Robinson
80
0
0
02 Jun 2025
In-the-wild Audio Spatialization with Flexible Text-guided Localization
Tianrui Pan
Jie Liu
Z. Huang
Jie Tang
Gangshan Wu
66
0
0
01 Jun 2025
PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data
Songjun Cao
Qinghua Wu
Jie Chen
Jin Li
Long Ma
54
0
0
01 Jun 2025
ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization
Pengyu Ren
Wenhao Guan
Kaidi Wang
Peijie Chen
Q. Hong
Lin Li
55
0
0
01 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
55
0
0
01 Jun 2025
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song
Jiawei Chen
Xiaobin Zhuang
Chenpeng Du
Ziyang Ma
...
Dongya Jia
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen
45
0
0
31 May 2025
When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds
Minsu Kang
Seolhee Lee
Choonghyeon Lee
Namhyun Cho
VLM
38
0
0
30 May 2025
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
Jin Wang
Wenbin Jiang
Xiangbo Wang
49
0
0
30 May 2025
Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem
Andres Fernandez
Juan Azcarreta
Cagdas Bilen
Jesus Monge Alvarez
41
0
0
30 May 2025
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Badr M. Abdullah
Matthew Baas
Bernd Möbius
Dietrich Klakow
28
0
0
30 May 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
Charlotte Pouw
Afra Alishahi
Willem H. Zuidema
35
0
0
28 May 2025
Study of Lightweight Transformer Architectures for Single-Channel Speech Enhancement
Haixin Zhao
Nilesh Madhu
76
0
0
27 May 2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Jeongsoo Choi
Jaehun Kim
Joon Son Chung
36
0
0
27 May 2025
VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion
Joon-Seung Choi
Dong-Min Byun
Hyung-Seok Oh
Seong-Whan Lee
88
0
0
27 May 2025
GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
Seokgi Lee
Jungjun Kim
TTA
121
0
0
26 May 2025
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
Anton Firc
Manasi Chibber
Jagabandhu Mishra
Vishwanath Pratap Singh
Tomi Kinnunen
K. Malinka
179
0
0
26 May 2025
SpeakStream: Streaming Text-to-Speech with Interleaved Data
Richard He Bai
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
AuLLM
AI4TS
64
0
0
25 May 2025
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
Zhichao Wu
Yueteng Kang
Songjun Cao
Long Ma
Qiulin Li
Qun Yang
DiffM
62
0
0
24 May 2025
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
Soumya Dutta
Avni Jain
Sriram Ganapathy
130
0
0
23 May 2025
Private kNN-VC: Interpretable Anonymization of Converted Speech
Carlos Franzreb
Arnab Das
Tim Polzehl
Sebastian Möller
35
0
0
23 May 2025
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Chi-Yuan Hsiao
Ke-Han Lu
Kai-Wei Chang
Chih-Kai Yang
Wei-Chih Chen
Hung-yi Lee
CLL
MoMe
208
0
0
23 May 2025
LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context
Natsuo Yamashita
Masaaki Yamamoto
Hiroaki Kokubo
Yohei Kawaguchi
41
0
0
23 May 2025
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Zhi-Wei Zhong
Akira Takahashi
Shuyang Cui
Keisuke Toyama
Shusuke Takahashi
Yuki Mitsufuji
VGen
74
0
0
22 May 2025
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
Tianduo Wang
Lu Xu
Wei Lu
Shanbo Cheng
52
0
0
22 May 2025
1
2
3
4
...
22
23
24
Next