Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
v1
v2 (latest)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,154 papers shown
Title
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLM
VLM
105
53
0
08 Feb 2024
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
144
117
0
07 Feb 2024
KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge
Guochen Yu
Runqiang Han
Chenglin Xu
Haoran Zhao
Nan Li
Chen Zhang
Xiguang Zheng
Chao Zhou
Qi Huang
Bin Yu
34
4
0
02 Feb 2024
PAM: Prompting Audio-Language Models for Audio Quality Assessment
Soham Deshmukh
Dareen Alharthi
Benjamin Elizalde
Hannes Gamper
Mahmoud Al Ismail
Rita Singh
Bhiksha Raj
Huaming Wang
96
13
0
01 Feb 2024
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
Yihan Wu
Soumi Maiti
Yifan Peng
Wangyou Zhang
Chenda Li
Yuyue Wang
Xihua Wang
Shinji Watanabe
Ruihua Song
82
4
0
31 Jan 2024
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Shijia Liao
Shiyi Lan
Arun George Zachariah
50
1
0
31 Jan 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
177
54
0
30 Jan 2024
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
Teysir Baoueb
Haocheng Liu
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
DiffM
87
6
0
30 Jan 2024
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
Akshit Arora
Rohan Badlani
Sungwon Kim
Rafael Valle
Bryan Catanzaro
35
0
0
24 Jan 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
116
0
0
23 Jan 2024
Adversarial speech for voice privacy protection from Personalized Speech generation
Shihao Chen
Liping Chen
Jie Zhang
KongAik Lee
Zhenhua Ling
Lirong Dai
AAML
58
1
0
22 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
Xin Li
Luisa Verdoliva
Shu Hu
250
64
0
22 Jan 2024
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
Prabhav Agrawal
Thilo Köhler
Zhiping Xiu
Prashant Serai
Qing He
41
1
0
19 Jan 2024
TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
93
4
0
17 Jan 2024
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
Haobin Tang
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
80
16
0
16 Jan 2024
DIFFRENT: A Diffusion Model for Recording Environment Transfer of Speech
Jae-Yeol Im
Juhan Nam
DiffM
65
3
0
16 Jan 2024
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
Yimin Deng
Huaizhen Tang
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
DRL
91
1
0
16 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
167
1
0
16 Jan 2024
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
Ye-Xin Lu
Yang Ai
Hui-Peng Du
Zhenhua Ling
93
10
0
12 Jan 2024
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Kenichi Fujita
Hiroshi Sato
Takanori Ashihara
Hiroki Kanagawa
Marc Delcroix
Takafumi Moriya
Yusuke Ijima
77
9
0
10 Jan 2024
A Good Score Does not Lead to A Good Generative Model
Sixu Li
Shi Chen
Qin Li
DiffM
138
18
0
10 Jan 2024
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Soumya Dutta
Sriram Ganapathy
75
3
0
09 Jan 2024
Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss
Yusheng Tian
Jingyu Li
Tan Lee
46
1
0
08 Jan 2024
BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators
Zihan Zhang
Jiayao Sun
Xianjun Xia
Chuanzeng Huang
Yijian Xiao
Lei Xie
54
4
0
08 Jan 2024
DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper
Jayeon Yi
Junghyun Koo
Kyogu Lee
64
2
0
08 Jan 2024
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Xi Chen
Jiakun Pei
Liumeng Xue
Mingyang Zhang
90
5
0
07 Jan 2024
StreamVC: Real-Time Low-Latency Voice Conversion
Yang Yang
Y. Kartynnik
Yunpeng Li
Jiuqiang Tang
Xing Li
George Sung
Matthias Grundmann
112
15
0
05 Jan 2024
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Yejin Jeon
Yunsu Kim
Gary Geunbae Lee
83
2
0
04 Jan 2024
Incremental FastPitch: Chunk-based High Quality Text to Speech
Muyang Du
Chuan Liu
Junjie Lai
65
0
0
03 Jan 2024
Efficient Parallel Audio Generation using Group Masked Language Modeling
Myeonghun Jeong
Minchan Kim
Joun Yeop Lee
Nam Soo Kim
62
7
0
02 Jan 2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Jinlong Xue
Yayue Deng
Yingming Gao
Ya Li
DiffM
110
36
0
02 Jan 2024
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
Yuhta Takida
Yukara Ikemiya
Takashi Shibuya
Kazuki Shimada
Woosung Choi
...
Naoki Murata
Toshimitsu Uesaka
Kengo Uchida
Wei-Hsiang Liao
Yuki Mitsufuji
BDL
118
13
0
31 Dec 2023
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion
Yun Chen
Lingxiao Yang
Qi Chen
Jianhuang Lai
Xiaohua Xie
76
4
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
102
175
0
28 Dec 2023
Accent-VITS:accent transfer for end-to-end TTS
Linhan Ma
Yongmao Zhang
Xinfa Zhu
Yinjiao Lei
Ziqian Ning
Pengcheng Zhu
Lei Xie
94
7
0
28 Dec 2023
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
135
94
0
25 Dec 2023
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
Xize Cheng
Rongjie Huang
Linjun Li
Tao Jin
Zehan Wang
Aoxiong Yin
Minglei Li
Xinyu Duan
Changpeng Yang
Zhou Zhao
88
6
0
23 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
116
25
0
22 Dec 2023
BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution
Guochen Yu
Xiguang Zheng
Nan Li
Runqiang Han
C. Zheng
Chen Zhang
Chao Zhou
Qi Huang
Bin Yu
129
6
0
21 Dec 2023
Style Modeling for Multi-Speaker Articulation-to-Speech
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
61
9
0
21 Dec 2023
BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
144
3
0
21 Dec 2023
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Xueyuan Chen
Xi Wang
Shaofei Zhang
Lei He
Zhiyong Wu
Xixin Wu
Helen M. Meng
86
8
0
19 Dec 2023
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
97
19
0
19 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
85
14
0
17 Dec 2023
GSQA: An End-to-End Model for Generative Spoken Question Answering
Min-Han Shih
Ho-Lam Chung
Yu-Chi Pai
Ming-Hao Hsu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
ELM
AuLLM
86
2
0
15 Dec 2023
FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge
Jiahe Lan
Jie Wang
Baochen Yan
Zheng Yan
Elisa Bertino
AAML
105
11
0
15 Dec 2023
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li
Yiwei Guo
Xie Chen
Kai Yu
115
18
0
14 Dec 2023
Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification
Haibin Wu
Heng-Cheng Kuo
Yu Tsao
Hung-yi Lee
AAML
67
2
0
14 Dec 2023
PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models
Robin Netzorg
A. Jalal
Luna McNulty
Gopala Krishna Anumanchipalli
DiffM
51
1
0
13 Dec 2023
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism
Georgios Milis
P. Filntisis
A. Roussos
Petros Maragos
CVBM
71
3
0
11 Dec 2023
Previous
1
2
3
...
9
10
11
...
22
23
24
Next