Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,107 papers shown
Title
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
Kenichi Fujita
Atsushi Ando
Yusuke Ijima
26
2
0
11 Feb 2024
GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
Haocheng Liu
Teysir Baoueb
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
37
4
0
09 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLM
VLM
56
34
0
08 Feb 2024
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
82
103
0
07 Feb 2024
KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge
Guochen Yu
Runqiang Han
Chenglin Xu
Haoran Zhao
Nan Li
Chen Zhang
Xiguang Zheng
Chao Zhou
Qi Huang
Bin Yu
17
3
0
02 Feb 2024
PAM: Prompting Audio-Language Models for Audio Quality Assessment
Soham Deshmukh
Dareen Alharthi
Benjamin Elizalde
Hannes Gamper
Mahmoud Al Ismail
Rita Singh
Bhiksha Raj
Huaming Wang
39
12
0
01 Feb 2024
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
Yihan Wu
Soumi Maiti
Yifan Peng
Wangyou Zhang
Chenda Li
Yuyue Wang
Xihua Wang
Shinji Watanabe
Ruihua Song
38
3
0
31 Jan 2024
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Shijia Liao
Shiyi Lan
Arun George Zachariah
21
1
0
31 Jan 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
61
41
0
30 Jan 2024
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
Teysir Baoueb
Haocheng Liu
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
DiffM
32
5
0
30 Jan 2024
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
Akshit Arora
Rohan Badlani
Sungwon Kim
Rafael Valle
Bryan Catanzaro
11
0
0
24 Jan 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
33
0
0
23 Jan 2024
Adversarial speech for voice privacy protection from Personalized Speech generation
Shihao Chen
Liping Chen
Jie Zhang
KongAik Lee
Zhenhua Ling
Lirong Dai
AAML
15
1
0
22 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
Xin Li
Luisa Verdoliva
Shu Hu
88
58
0
22 Jan 2024
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
Prabhav Agrawal
Thilo Köhler
Zhiping Xiu
Prashant Serai
Qing He
26
1
0
19 Jan 2024
TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
44
4
0
17 Jan 2024
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
Haobin Tang
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
34
12
0
16 Jan 2024
DIFFRENT: A Diffusion Model for Recording Environment Transfer of Speech
Jae-Yeol Im
Juhan Nam
DiffM
20
3
0
16 Jan 2024
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
Yimin Deng
Huaizhen Tang
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
DRL
44
1
0
16 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
52
2
0
16 Jan 2024
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
Ye-Xin Lu
Yang Ai
Hui-Peng Du
Zhenhua Ling
30
6
0
12 Jan 2024
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Kenichi Fujita
Hiroshi Sato
Takanori Ashihara
Hiroki Kanagawa
Marc Delcroix
Takafumi Moriya
Yusuke Ijima
41
8
0
10 Jan 2024
A Good Score Does not Lead to A Good Generative Model
Sixu Li
Shi Chen
Qin Li
DiffM
79
15
0
10 Jan 2024
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Soumya Dutta
Sriram Ganapathy
31
1
0
09 Jan 2024
Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss
Yusheng Tian
Jingyu Li
Tan Lee
24
0
0
08 Jan 2024
BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators
Zihan Zhang
Jiayao Sun
Xianjun Xia
Chuanzeng Huang
Yijian Xiao
Lei Xie
34
3
0
08 Jan 2024
DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper
Jayeon Yi
Junghyun Koo
Kyogu Lee
22
2
0
08 Jan 2024
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Xi Chen
Jiakun Pei
Liumeng Xue
Mingyang Zhang
43
5
0
07 Jan 2024
StreamVC: Real-Time Low-Latency Voice Conversion
Yang Yang
Y. Kartynnik
Yunpeng Li
Jiuqiang Tang
Xing Li
George Sung
Matthias Grundmann
36
12
0
05 Jan 2024
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Yejin Jeon
Yunsu Kim
Gary Geunbae Lee
40
2
0
04 Jan 2024
Incremental FastPitch: Chunk-based High Quality Text to Speech
Muyang Du
Chuan Liu
Junjie Lai
23
0
0
03 Jan 2024
Efficient Parallel Audio Generation using Group Masked Language Modeling
Myeonghun Jeong
Minchan Kim
Joun Yeop Lee
Nam Soo Kim
30
5
0
02 Jan 2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Jinlong Xue
Yayue Deng
Yingming Gao
Ya Li
DiffM
23
29
0
02 Jan 2024
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
Yuhta Takida
Yukara Ikemiya
Takashi Shibuya
Kazuki Shimada
Woosung Choi
...
Naoki Murata
Toshimitsu Uesaka
Kengo Uchida
Wei-Hsiang Liao
Yuki Mitsufuji
BDL
51
12
0
31 Dec 2023
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion
Yun Chen
Lingxiao Yang
Qi Chen
Jianhuang Lai
Xiaohua Xie
40
3
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
40
147
0
28 Dec 2023
Accent-VITS:accent transfer for end-to-end TTS
Linhan Ma
Yongmao Zhang
Xinfa Zhu
Yinjiao Lei
Ziqian Ning
Pengcheng Zhu
Lei Xie
27
7
0
28 Dec 2023
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
60
77
0
25 Dec 2023
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
Xize Cheng
Rongjie Huang
Linjun Li
Tao Jin
Zehan Wang
Aoxiong Yin
Minglei Li
Xinyu Duan
Changpeng Yang
Zhou Zhao
41
2
0
23 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
34
22
0
22 Dec 2023
BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution
Guochen Yu
Xiguang Zheng
Nan Li
Runqiang Han
C. Zheng
Chen Zhang
Chao Zhou
Qi Huang
Bin Yu
69
5
0
21 Dec 2023
Style Modeling for Multi-Speaker Articulation-to-Speech
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
31
8
0
21 Dec 2023
BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
73
3
0
21 Dec 2023
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Xueyuan Chen
Xi Wang
Shaofei Zhang
Lei He
Zhiyong Wu
Xixin Wu
Helen M. Meng
50
7
0
19 Dec 2023
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
42
17
0
19 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
36
9
0
17 Dec 2023
GSQA: An End-to-End Model for Generative Spoken Question Answering
Min-Han Shih
Ho-Lam Chung
Yu-Chi Pai
Ming-Hao Hsu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
ELM
AuLLM
33
2
0
15 Dec 2023
FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge
Jiahe Lan
Jie Wang
Baochen Yan
Zheng Yan
Elisa Bertino
AAML
35
10
0
15 Dec 2023
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li
Yiwei Guo
Xie Chen
Kai Yu
50
13
0
14 Dec 2023
Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification
Haibin Wu
Heng-Cheng Kuo
Yu Tsao
Hung-yi Lee
AAML
32
1
0
14 Dec 2023
Previous
1
2
3
...
8
9
10
...
21
22
23
Next