Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
v1
v2 (latest)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,154 papers shown
Title
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Ze Yuan
Yanqing Liu
Shujie Liu
Sheng Zhao
AuLLM
150
2
0
06 Dec 2024
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Jiaxuan Liu
Zhaoci Liu
Yihan Hu
Yingying Gao
Shilei Zhang
Zhenhua Ling
DiffM
123
2
0
04 Dec 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
141
16
0
29 Nov 2024
SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Youngjun Sim
Jinsung Yoon
Young-Joo Suh
122
1
0
25 Nov 2024
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
Xiao-Hang Jiang
Hui-Peng Du
Yang Ai
Ye-Xin Lu
Zhen-Hua Ling
88
0
0
18 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
64
3
0
14 Nov 2024
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang
Yu-Kuan Fu
Chen-An Li
Yi-Cheng Lin
Yu-Xiang Lin
...
Ulin Sanga
Xuanjun Chen
Po-Chun Hsu
Shu-Wen Yang
Hung-yi Lee
AuLLM
116
5
0
11 Nov 2024
Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data Generation
Wilhelm Ågren
Victorio Úbeda Sosa
90
0
0
11 Nov 2024
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
Tomoki Toda
80
2
0
11 Nov 2024
Large Generative Model-assisted Talking-face Semantic Communication System
Feibo Jiang
Siwei Tu
Li Dong
Cunhua Pan
Jiangzhou Wang
Xiaohu You
62
3
0
06 Nov 2024
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
132
9
0
04 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Yanjie Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
87
17
0
02 Nov 2024
MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Hui-Peng Du
Ye-Xin Lu
Zhen-Hua Ling
117
3
0
01 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
128
0
0
31 Oct 2024
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
Ioannis Tsiamas
Matthias Sperber
Andrew Finch
Sarthak Garg
65
1
0
31 Oct 2024
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
Łukasz Bondaruk
Jakub Kubiak
Mateusz Czyżnikiewicz
69
0
0
30 Oct 2024
APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
Hui-Peng Du
Yang Ai
Rui Zheng
Zhen-Hua Ling
76
2
0
30 Oct 2024
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis
Kehan Sui
Jinxu Xiang
Fang Jin
DiffM
49
0
0
29 Oct 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
129
9
0
29 Oct 2024
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
Ella Zeldes
Or Tal
Yossi Adi
64
1
0
28 Oct 2024
Mitigating Unauthorized Speech Synthesis for Voice Protection
Zhisheng Zhang
Qianyi Yang
Derui Wang
Pengyang Huang
Yuxin Cao
Kai Ye
Jie Hao
AAML
71
3
0
28 Oct 2024
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Suparna De
Ionut Bostan
Nishanth Sastry
124
0
0
24 Oct 2024
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams
Srija Anand
Praveen Srinivasa Varadhan
Mehak Singal
Mitesh M. Khapra
50
0
0
23 Oct 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDa
BDL
AuLLM
VLM
155
20
0
23 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
115
3
0
21 Oct 2024
Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
Suhita Ghosh
Tim Thiele
Frederic Lorbeer
Frank Dreyer
Sebastian Stober
73
0
0
20 Oct 2024
ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal Steps
Yulin Song
Guorui Sang
Jing Yu
Chuangbai Xiao
DiffM
70
1
0
20 Oct 2024
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
Md Mubtasim Ahasan
Md Fahim
Tasnim Mohiuddin
A K M Mahbubur Rahman
Aman Chadha
Tariq Iqbal
M. A. Amin
Md. Mofijul Islam
Amin Ahsan Ali
108
1
0
19 Oct 2024
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
T. Nguyen
Seymanur Akti
Ngoc-Quan Pham
A. Waibel
118
2
0
19 Oct 2024
SNAC: Multi-Scale Neural Audio Codec
Hubert Siuzdak
Florian Grötschla
Luca A. Lanzendörfer
58
19
0
18 Oct 2024
Optimal Transport Maps are Good Voice Converters
Arip Asadulaev
Rostislav Korst
V. Shutov
Alexander Korotin
Yaroslav Grebnyak
Vahe Egiazarian
Evgeny Burnaev
OT
70
2
0
17 Oct 2024
Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
Abhishek Gupta
Amruta Parulekar
Sameep Chattopadhyay
Preethi Jyothi
VLM
60
0
0
17 Oct 2024
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
Yu Gu
Qiushi Zhu
Guangzhi Lei
Chao Weng
Jane Polak Scowcroft
DiffM
74
0
0
17 Oct 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
123
9
0
16 Oct 2024
SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model
Jianwei Cui
Yu Gu
Chao Weng
Jie Zhang
Liping Chen
Lirong Dai
97
4
0
16 Oct 2024
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Saksham Singh Kushwaha
Jianbo Ma
Mark R. P. Thomas
Yapeng Tian
Avery Bruni
62
1
0
15 Oct 2024
Audio-based Kinship Verification Using Age Domain Conversion
Qiyang Sun
Alican Akman
Xin Jing
M. Milling
Björn Schuller
51
1
0
14 Oct 2024
Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
121
8
0
09 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
55
0
0
09 Oct 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Xin Zhang
Xiang Lyu
Zhihao Du
Qian Chen
Dong Zhang
...
Yuxuan Wang
Bin Zhang
Heng Lu
Yaqian Zhou
Xipeng Qiu
AuLLM
114
9
0
09 Oct 2024
SRC-gAudio: Sampling-Rate-Controlled Audio Generation
Chenxing Li
Manjie Xu
Dong Yu
DiffM
55
0
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
95
4
0
09 Oct 2024
FINALLY: fast and universal speech enhancement with studio-like quality
Nicholas Babaev
Kirill Tamogashev
Azat Saginbaev
Ivan Shchekotov
Hanbin Bae
Hosang Sung
WonJun Lee
Hoon-Young Cho
Pavel Andreev
132
5
0
08 Oct 2024
Stage-Wise and Prior-Aware Neural Speech Phase Prediction
Fei Liu
Yang Ai
Hui-Peng Du
Ye-Xin Lu
Rui Zheng
Zhen-Hua Ling
63
0
0
07 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
121
4
0
04 Oct 2024
NTU-NPU System for Voice Privacy 2024 Challenge
Nikita Kuzmin
Hieu-Thi Luong
Jixun Yao
Lei Xie
Kong Aik Lee
Eng Siong Chng
108
1
0
03 Oct 2024
SoundMorpher: Perceptually-Uniform Sound Morphing with Diffusion Model
Xinlei Niu
Jing Zhang
Charles Patrick Martin
52
2
0
03 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
105
3
0
03 Oct 2024
MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation
Mingzhen Sun
Weining Wang
Yanyuan Qiao
Jiahui Sun
Zihan Qin
Longteng Guo
Xinxin Zhu
Jing Liu
DiffM
VGen
62
3
0
02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
172
3
0
02 Oct 2024
Previous
1
2
3
4
5
...
22
23
24
Next