Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
v1
v2 (latest)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,154 papers shown
Title
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
119
80
0
30 May 2023
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Erik Ekstedt
Siyang Wang
Éva Székely
Joakim Gustafson
Gabriel Skantze
66
8
0
29 May 2023
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Kun Song
Yi Ren
Yinjiao Lei
Chunfeng Wang
Kun Wei
Linfu Xie
Xiang Yin
Zejun Ma
105
9
0
28 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
66
4
0
28 May 2023
Translatotron 3: Speech to Speech Translation with Monolingual Data
Eliya Nachmani
Alon Levkovitch
Yi-Yang Ding
Chulayutsh Asawaroengchai
Heiga Zen
Michelle Tadmor Ramanovich
97
15
0
27 May 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Xiang Li
Songxiang Liu
Max W. Y. Lam
Zhiyong Wu
Chao Weng
Helen Meng
DiffM
143
5
0
26 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
79
1
0
26 May 2023
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
80
35
0
25 May 2023
Efficient Neural Music Generation
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffM
MGen
97
56
0
25 May 2023
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Lin Li
...
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
132
8
0
24 May 2023
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani
Alon Levkovitch
Roy Hirsch
Julián Salazar
Chulayutsh Asawaroengchai
Soroosh Mariooryad
Ehud Rivlin
RJ Skerry-Ryan
Michelle Tadmor Ramanovich
AuLLM
120
45
0
24 May 2023
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
67
4
0
23 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Minki Kang
Wooseok Han
Sung Ju Hwang
Eunho Yang
DiffM
89
19
0
23 May 2023
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings
Yuki Saito
Shinnosuke Takamichi
Eiji Iimori
Kentaro Tachibana
Hiroshi Saruwatari
79
11
0
23 May 2023
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
Yuki Saito
Eiji Iimori
Shinnosuke Takamichi
Kentaro Tachibana
Hiroshi Saruwatari
59
2
0
23 May 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Ziyue Jiang
Qiang Yang
Jia-li Zuo
Zhe Ye
Rongjie Huang
Yixiang Ren
Zhou Zhao
DiffM
101
17
0
23 May 2023
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
178
361
0
22 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
106
10
0
22 May 2023
Towards generalizing deep-audio fake detection networks
Konstantin Gasenzer
Moritz Wolter
88
4
0
22 May 2023
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
138
61
0
22 May 2023
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation
Zhe Ye
Wei Xue
Xuejiao Tan
Qi-fei Liu
Yi-Ting Guo
89
2
0
22 May 2023
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
DiffM
136
20
0
22 May 2023
Duplex Diffusion Models Improve Speech-to-Speech Translation
Xianchao Wu
DiffM
93
5
0
22 May 2023
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
Detai Xin
Shinnosuke Takamichi
Ai Morimatsu
Hiroshi Saruwatari
71
10
0
21 May 2023
EE-TTS: Emphatic Expressive TTS with Linguistic Information
Yifan Zhong
Chen Zhang
Xule Liu
Chenxi Sun
Weishan Deng
Haifeng Hu
Zhongqian Sun
55
3
0
20 May 2023
Any-to-Any Generation via Composable Diffusion
Zineng Tang
Ziyi Yang
Chenguang Zhu
Michael Zeng
Joey Tianyi Zhou
VGen
DiffM
130
191
0
19 May 2023
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Neil Shah
Vishal Tambrahalli
Saiteja Kosgi
N. Pedanekar
Vineet Gandhi
75
0
0
19 May 2023
A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model
Ibrahim Malik
S. Latif
Raja Jurdak
Björn Schuller
DiffM
46
9
0
19 May 2023
DUB: Discrete Unit Back-translation for Speech Translation
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
90
27
0
19 May 2023
mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra
Chenhao Shuai
Chaohua Shi
Lu Gan
Hongqing Liu
78
8
0
18 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Won Jang
D. Lim
Heayoung Park
92
1
0
18 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
64
19
0
18 May 2023
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
DiffM
140
20
0
18 May 2023
Controllable Speaking Styles Using a Large Language Model
A. Sigurgeirsson
Simon King
57
3
0
17 May 2023
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Xintao Zhao
Shuai Wang
Yang Chao
Zhiyong Wu
Helen Meng
73
3
0
16 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Qingkai Fang
Yang Feng
75
14
0
15 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
101
14
0
13 May 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
103
46
0
11 May 2023
Extending Audio Masked Autoencoders Toward Audio Restoration
Zhi-Wei Zhong
Hao Shi
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
67
6
0
11 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
83
1
0
09 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Jingbei Li
Sipan Li
Ping Chen
Lu Zhang
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
83
3
0
09 May 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
85
4
0
08 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
266
132
0
04 May 2023
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Jinlong Xue
Yayue Deng
Fengping Wang
Ya Li
Yingming Gao
J. Tao
Jianqing Sun
Jiaen Liang
75
10
0
03 May 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
Ye-Xin Lu
Yang Ai
Zhenhua Ling
117
1
0
26 Apr 2023
Foley Sound Synthesis at the DCASE 2023 Challenge
Keunwoo Choi
Jae-Yeol Im
Laurie M. Heller
Brian McFee
Keisuke Imoto
Yuki Okamoto
Mathieu Lagrange
Shinosuke Takamichi
78
32
0
25 Apr 2023
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Kenichi Fujita
Takanori Ashihara
Hiroki Kanagawa
Takafumi Moriya
Yusuke Ijima
92
11
0
24 Apr 2023
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
237
152
0
24 Apr 2023
Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
84
6
0
13 Apr 2023
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Jiatong Shi
Yun Tang
Ann Lee
Hirofumi Inaguma
Changhan Wang
J. Pino
Shinji Watanabe
77
9
0
10 Apr 2023
Previous
1
2
3
...
14
15
16
...
22
23
24
Next