Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
v1
v2 (latest)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,154 papers shown
Title
Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models
Hao-Ran Cheng
Erjia Xiao
Yichi Wang
Kaidi Xu
Mengshu Sun
Jindong Gu
Renjing Xu
93
0
0
14 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
112
0
0
11 Mar 2025
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Juncheng Wang
Chao Xu
Cheng Yu
Lei Shang
Zhe Hu
Shujun Wang
Liefeng Bo
DiffM
VGen
102
0
0
10 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu
Yu Fang
Zhouhan Lin
97
1
0
07 Mar 2025
FlowDec: A flow-based full-band general audio codec with high perceptual quality
Simon Welker
Matthew Le
Ricky T. Q. Chen
Wei-Ning Hsu
Timo Gerkmann
Alexander Richard
Yi-Chiao Wu
106
1
0
03 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xiang Wang
Mingqi Jiang
Zejun Ma
Ziyu Zhang
Shixuan Liu
...
Zhifei Li
Xie Chen
Lei Xie
Yu Guo
Wei Xue
132
22
0
03 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
103
1
0
02 Mar 2025
PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement
Zizhen Lin
Junyu Wang
Ruili Li
Fei Shen
Xi Xuan
103
2
0
27 Feb 2025
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
Nian Shao
Rui Zhou
Pengyu Wang
Xian Li
Ying Fang
Yujie Yang
Xiaofei Li
127
0
0
27 Feb 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Yanzhe Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
153
0
0
26 Feb 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
105
0
0
26 Feb 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Ju Liu
Xuelong Li
Fangqiu Yi
Xuelong Li
DiffM
MDE
116
1
0
26 Feb 2025
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Tianpeng Li
Qingbin Liu
Tao Zhang
Yuanbo Fang
Zheng Liang
...
Bin Cui
Jianhua Xu
Haoze Sun
Guosheng Dong
Xin Wu
AuLLM
119
7
0
24 Feb 2025
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li
William B. Held
Michael Joseph Ryan
Kunat Pipatanakul
Potsawee Manakul
Hao Zhu
Diyi Yang
AuLLM
ALM
99
2
0
21 Feb 2025
A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond
Shreya Shukla
Jose Torres
Abhijit Mishra
Jacek Gwizdka
Shounak Roychowdhury
127
0
0
20 Feb 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
Wenxiang Guo
Yu Zhang
Changhao Pan
Rongjie Huang
Li Tang
Ruiqi Li
Zhiqing Hong
Yongqi Wang
Zhou Zhao
302
4
0
18 Feb 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
103
1
0
17 Feb 2025
Artifact-free Sound Quality in DNN-based Closed-loop Systems for Audio Processing
chuan Wen
Guy Torfs
Sarah Verhulst
107
0
0
17 Feb 2025
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer
Zhengyan Sheng
Zhihao Du
Shiliang Zhang
Zhijie Yan
Yexin Yang
Zhenhua Ling
126
2
0
16 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
Jiajian Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
117
2
0
16 Feb 2025
The Case for Cleaner Biosignals: High-fidelity Neural Compressor Enables Transfer from Cleaner iEEG to Noisier EEG
Francesco Stefano Carzaniga
Gary Tom Hoppeler
Michael Hersche
Kaspar Anton Schindler
Abbas Rahimi
91
0
0
10 Feb 2025
Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling
Xiao Li
Zekai Zhang
Xiang Li
Siyi Chen
Zhihui Zhu
Peng Wang
Qing Qu
DiffM
191
1
0
09 Feb 2025
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting
Mohammad Jahid Ibna Basher
Md. Kowsher
Md Saiful Islam
R. N. Nandi
Nusrat Jahan Prottasha
...
Tareq Al Muntasir
Shammur A. Chowdhury
Firoj Alam
Niloofar Yousefi
O. Garibay
99
0
0
09 Feb 2025
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Jialong Zuo
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Xize Cheng
...
Wenrui Liu
Guangyan Zhang
Zehai Tu
Yiwen Guo
Zhou Zhao
100
2
0
08 Feb 2025
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Wei Deng
Siyi Zhou
Jingchen Shu
Jinchao Wang
Lu Wang
VLM
105
4
0
08 Feb 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
107
0
0
01 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
189
23
0
28 Jan 2025
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
Shuqi Dai
Yunyun Wang
Roger B. Dannenberg
Zeyu Jin
DiffM
126
0
0
23 Jan 2025
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
Ünal Ege Gaznepoglu
Nils Peters
148
0
0
22 Jan 2025
Audio Texture Manipulation by Exemplar-Based Analogy
Kan Jen Cheng
Tingle Li
Gopala Anumanchipalli
DiffM
82
1
0
21 Jan 2025
Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR
Karl El Hajal
Enno Hermann
Ajinkya Kulkarni
Mathew Magimai.-Doss
160
4
0
20 Jan 2025
FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation
Jaekwon Im
Juhan Nam
DiffM
116
0
0
18 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
165
0
0
15 Jan 2025
Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis
Rui Liu
Zhenqi Jia
F. Bao
Hong Li
79
2
0
11 Jan 2025
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
Samir Sadok
Simon Leglaive
Laurent Girin
Gaël Richard
Xavier Alameda-Pineda
134
3
0
10 Jan 2025
Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation
Terrance Yu-Hao Chen
Yulin Chen
Pontus Soederhaell
Sadrishya Agrawal
Kateryna Shapovalenko
187
0
0
08 Jan 2025
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
Hsi-Che Lin
Yi-Cheng Lin
Huang-Cheng Chou
Hung-yi Lee
134
5
0
08 Jan 2025
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
Yi Yuan
Dongya Jia
Xiaobin Zhuang
Yuanzhe Chen
Zhengxi Liu
...
Yansen Wang
Xubo Liu
Xiyuan Kang
Mark D. Plumbley
Wenwu Wang
VLM
115
4
0
03 Jan 2025
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
76
1
0
31 Dec 2024
Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models
Tornike Karchkhadze
M. Izadi
Shlomo Dubnov
DiffM
90
5
0
31 Dec 2024
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
72
0
0
31 Dec 2024
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
129
6
0
26 Dec 2024
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
Ruohong Yang
Peng Hu
Xi Peng
Xiting Liu
Yunfan Li
121
0
0
25 Dec 2024
HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation
Xiao Zhang
Shaoxuan Wu
Peilin Zhang
Zhuo Jin
Xiaosong Xiong
Qirong Bu
Jingkun Chen
Jun Feng
147
0
0
25 Dec 2024
Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis
Zhenqi Jia
Rui Liu
73
1
0
25 Dec 2024
Autoregressive Speech Synthesis with Next-Distribution Prediction
Xinfa Zhu
WenJie Tian
Lei Xie
VLM
256
5
0
22 Dec 2024
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation
Lucas Goncalves
Prashant Mathur
Xing Niu
Brady Houston
Chandrashekhar Lavania
Srikanth Vishnubhotla
Lijia Sun
Anthony Ferritto
149
0
0
21 Dec 2024
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Chenyu Yang
Shuai Wang
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Yongjun Xu
Yizhi Zhou
Haina Zhu
Haoyang Li
KELM
432
2
0
18 Dec 2024
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
Xiangheng He
Junjie Chen
Zixing Zhang
Björn W. Schuller
125
0
0
16 Dec 2024
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha
Yapeng Tian
DiffM
VGen
135
2
0
14 Dec 2024
Previous
1
2
3
4
5
6
...
22
23
24
Next