Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
v1
v2 (latest)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,154 papers shown
Title
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information
Nicholas Sanders
Yuanchao Li
Korin Richmond
Simon King
78
0
0
21 May 2025
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
Jixun Yao
Hexin Liu
Eng Siong Chng
Lei Xie
66
0
0
21 May 2025
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
Yuhao Zhang
Xiangnan Ma
Kaiqi Kou
Peizhuo Liu
Weiqiao Shan
Benyou Wang
Tong Xiao
Yuxin Huang
Zhengtao Yu
Jingbo Zhu
VLM
40
0
0
21 May 2025
Pairwise Evaluation of Accent Similarity in Speech Synthesis
Jinzuomu Zhong
Suyuan Liu
Dan Wells
Korin Richmond
117
0
0
20 May 2025
Articulatory Feature Prediction from Surface EMG during Speech Production
Jihwan Lee
Kevin Huang
Kleanthis Avramidis
Simon Pistrosch
Monica Gonzalez-Machorro
Yoonjeong Lee
Björn Schuller
Louis Goldstein
Shrikanth Narayanan
168
1
0
20 May 2025
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Yu Zhang
Wenxiang Guo
Changhao Pan
Dongyu Yao
Zhiyuan Zhu
Ziyue Jiang
Yuhan Wang
Tao Jin
Zhou Zhao
VLM
129
0
0
20 May 2025
More-than-Human Storytelling: Designing Longitudinal Narrative Engagements with Generative AI
Émilie Fabre
Katie Seaborn
Shuta Koiwai
Mizuki Watanabe
Paul Riesch
AI4CE
73
1
0
20 May 2025
RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations
Seungmin Kim
Sohee Park
Donghyun Kim
Jisu Lee
Daeseon Choi
AAML
61
0
0
19 May 2025
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning
Biel Tura Vecino
Subhadeep Maji
Aravind Varier
Antonio Bonafonte
Ivan Valles
...
Roberto Barra-Chicote
Ariya Rastrow
C. Papayiannis
Volker Leutnant
Trevor Wood
48
0
0
19 May 2025
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
Danilo de Oliveira
Julius Richter
Tal Peer
Timo Gerkmann
DiffM
115
0
0
16 May 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
63
0
0
14 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
204
3
0
12 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
Eng Siong Chng
93
0
0
12 May 2025
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Linrong Pan
Chenglong Jiang
Gaoze Hou
Ying Gao
112
0
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
351
1
0
05 May 2025
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Qingkai Fang
Yan Zhou
Shoutao Guo
Shaolei Zhang
Yang Feng
AuLLM
106
4
0
05 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
Anton Van Den Hengel
Yuankai Qi
Qingming Huang
446
0
0
02 May 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
140
1
0
29 Apr 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
144
0
0
29 Apr 2025
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
84
0
0
27 Apr 2025
Versatile Framework for Song Generation with Prompt-based Control
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
229
2
0
27 Apr 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
77
0
0
25 Apr 2025
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Scott Wellington
Xuechen Liu
Junichi Yamagishi
113
0
0
22 Apr 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
Keqi Deng
Wenxi Chen
Xie Chen
P. Woodland
124
0
0
22 Apr 2025
SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation
Yue Li
Weizhi Liu
Dongdong Lin
176
0
0
21 Apr 2025
MusFlow: Multimodal Music Generation via Conditional Flow Matching
Jiahao Song
Yuzhao Wang
88
0
0
18 Apr 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
Guanrou Yang
Chen Yang
Qian Chen
Ziyang Ma
Wenxi Chen
...
Fan Yu
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
207
3
0
17 Apr 2025
Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech
P. O'Reilly
Zeyu Jin
Jiaqi Su
Bryan Pardo
100
0
0
15 Apr 2025
Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy
Botao Zhao
Zuheng Kang
Yayun He
Xiaoyang Qu
Junqing Peng
Jing Xiao
Jianzong Wang
77
0
0
15 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
Shixuan Liu
Jiajian Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
125
1
0
14 Apr 2025
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
Yubing Cao
Yinfeng Yu
Yongming Li
Liejun Wang
69
0
0
12 Apr 2025
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
168
0
0
11 Apr 2025
On the Design of Diffusion-based Neural Speech Codecs
Pietro Foti
Andreas Brendel
DiffM
91
0
0
11 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
170
14
0
11 Apr 2025
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
Kaidi Wang
Wenhao Guan
Shenghui Lu
Jianglong Yao
Lin Li
Q. Hong
187
3
0
10 Apr 2025
A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Zhen-Hua Ling
64
0
0
09 Apr 2025
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
Yuhao Wang
Heyang Liu
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
474
3
0
05 Apr 2025
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect
Hedi Naouara
Jean-Pierre Lorré
Jérôme Louradour
85
0
0
03 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGen
VGen
300
1
0
01 Apr 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
Hyeongju Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
161
0
0
29 Mar 2025
ReverBERT: A State Space Model for Efficient Text-Driven Speech Style Transfer
Michael Brown
Sofia Martinez
Priya Singh
74
0
0
26 Mar 2025
Measuring the Robustness of Audio Deepfake Detectors
Xiang Li
Pin-Yu Chen
Wenqi Wei
79
0
0
21 Mar 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
87
1
0
21 Mar 2025
HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks
Ekaterina Dmitrieva
Maksim Kaledin
131
0
0
21 Mar 2025
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
Tao Feng
Zhiyuan Zhao
Yifan Xie
Yuqi Ye
Xiangyang Luo
Xun Guan
Yongqian Li
132
0
0
21 Mar 2025
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
Tianze Luo
Xingchen Miao
Wenbo Duan
DiffM
101
0
0
20 Mar 2025
Serenade: A Singing Style Conversion Framework Based On Audio Infilling
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
69
0
0
16 Mar 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan Lu
SSL
148
1
0
15 Mar 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
Anton Van Den Hengel
Yuankai Qi
152
2
0
15 Mar 2025
Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models
Hao-Ran Cheng
Erjia Xiao
Yichi Wang
Kaidi Xu
Mengshu Sun
Jindong Gu
Renjing Xu
93
0
0
14 Mar 2025
Previous
1
2
3
4
5
...
22
23
24
Next