Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,107 papers shown
Title
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
61
15
0
02 Nov 2022
SIMD-size aware weight regularization for fast neural vocoding on CPU
Hiroki Kanagawa
Yusuke Ijima
16
0
0
02 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
48
18
0
01 Nov 2022
Waveform Boundary Detection for Partially Spoofed Audio
Zexin Cai
Weiqing Wang
Ming Li
24
25
0
01 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
35
0
0
31 Oct 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents
Yongmao Zhang
Zhichao Wang
Pei-Yin Yang
Hongshen Sun
Zhisheng Wang
Linfu Xie
41
6
0
31 Oct 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
29
2
0
31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis
Anusha Prakash
H. Murthy
34
4
0
31 Oct 2022
Magnitude or Phase? A Two Stage Algorithm for Dereverberation
Ayal Schwartz
Sharon Gannot
Shlomo E. Chazan
19
0
0
31 Oct 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
39
12
0
31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
23
0
0
28 Oct 2022
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit
Ryuichi Yamamoto
Reo Yoneyama
Tomoki Toda
247
11
0
28 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Masaya Kawamura
Yuma Shirahata
Ryuichi Yamamoto
Kentaro Tachibana
34
15
0
28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
33
10
0
28 Oct 2022
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder
Reo Yoneyama
Yi-Chiao Wu
Tomoki Toda
44
27
0
27 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Jingyi Li
Weiping Tu
Li Xiao
46
98
0
27 Oct 2022
Explicit Intensity Control for Accented Text-to-speech
Rui Liu
Haolin Zuo
De Hu
Guanglai Gao
Haizhou Li
26
6
0
27 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Yifan Hu
Rui Liu
Guanglai Gao
Haizhou Li
182
7
0
27 Oct 2022
Articulation GAN: Unsupervised modeling of articulatory learning
Gašper Beguš
Alan Zhou
Peter Wu
Gopala K Anumanchipalli
GAN
39
8
0
27 Oct 2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Haohan Guo
Fenglong Xie
Xixin Wu
Hui Lu
Helen Meng
163
3
0
27 Oct 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
30
6
0
26 Oct 2022
Full-band General Audio Synthesis with Score-based Diffusion
Santiago Pascual
Gautam Bhattacharya
Chunghsin Yeh
Jordi Pons
Joan Serrà
DiffM
30
33
0
26 Oct 2022
Cover Reproducible Steganography via Deep Generative Models
Kejiang Chen
Hang Zhou
Yaofei Wang
Meng Li
Weiming Zhang
Neng H. Yu
DiffM
31
9
0
26 Oct 2022
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Yuhao Liang
Pei-Ning Chen
F. Yu
Xinfa Zhu
Tianyi Xu
Linfu Xie
33
0
0
26 Oct 2022
EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient body-conduction microphones
J. Hauret
Thomas Joubaud
V. Zimpfer
Éric Bavu
17
9
0
25 Oct 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
AI4TS
31
5
0
25 Oct 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using
β
β
β
-VAE
Hui Lu
Disong Wang
Xixin Wu
Zhiyong Wu
Xunying Liu
Helen M. Meng
DRL
28
9
0
25 Oct 2022
High Fidelity Neural Audio Compression
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
37
612
0
24 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS
Ziqi Liang
36
0
0
24 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
Chunhui Wang
Chang Zeng
Jun Chen
Xingji He
56
7
0
23 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
40
22
0
21 Oct 2022
Boomerang: Local sampling on image manifolds using diffusion models
Lorenzo Luzi
P. Mayer
Josue Casco-Rodriguez
Ali Siahkoohi
Richard G. Baraniuk
DiffM
37
20
0
21 Oct 2022
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
40
8
0
20 Oct 2022
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
Chihiro Watanabe
Hirokazu Kameoka
DRL
47
0
0
20 Oct 2022
Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders
Xin Wang
Junichi Yamagishi
26
36
0
19 Oct 2022
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Detai Xin
Hiroshi Saruwatari
45
3
0
18 Oct 2022
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Hien Ohnaka
Shinnosuke Takamichi
Keisuke Imoto
Yuki Okamoto
Kazuki Fujii
Hiroshi Saruwatari
DiffM
29
8
0
17 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Emily R. Bartusiak
Edward J. Delp
27
12
0
14 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
29
16
0
14 Oct 2022
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy
Sarina Meyer
Pascal Tilli
Pavel Denisov
Florian Lux
Julia Koch
Ngoc Thang Vu
30
31
0
13 Oct 2022
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion
Yuxiang Zhang
Jingze Lu
Xingming Wang
Zhuo Li
Runqiu Xiao
Wenchao Wang
Ming Li
Pengyuan Zhang
51
5
0
13 Oct 2022
Can we use Common Voice to train a Multi-Speaker TTS system?
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
27
10
0
12 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
37
14
0
12 Oct 2022
JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE
Yueh-Kao Wu
Ching-Yu Chiu
Yi-Hsuan Yang
ViT
26
14
0
12 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
27
5
0
12 Oct 2022
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models
Matthew Baas
Herman Kamper
DiffM
40
8
0
11 Oct 2022
ConchShell: A Generative Adversarial Networks that Turns Pictures into Piano Music
Wanshu Fan
Yu-Chuan Su
Yuxin Huang
GAN
16
2
0
11 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
25
53
0
06 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Yangfu Li
Xiaodan Lin
Jiaxin Yang
19
0
0
06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
40
16
0
05 Oct 2022
Previous
1
2
3
...
16
17
18
...
21
22
23
Next