Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,107 papers shown
Title
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak
Piotr Dura
Pol van Rijn
Nori Jacoby
AI4TS
18
30
0
31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
D. Lim
Sunghee Jung
Eesung Kim
27
51
0
31 Mar 2022
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping
Yuma Koizumi
Heiga Zen
Kohei Yatabe
Nanxin Chen
M. Bacchiani
DiffM
38
45
0
31 Mar 2022
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification
Saurabh Kataria
Jesús Villalba
Laureano Moro-Velazquez
Najim Dehak
19
3
0
30 Mar 2022
Generative Spoken Dialogue Language Modeling
Tu Nguyen
Eugene Kharitonov
Jade Copet
Yossi Adi
Wei-Ning Hsu
...
Paden Tomasello
Robin Algayres
Benoît Sagot
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
49
81
0
30 Mar 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion
Zijiang Yang
Xin Jing
Andreas Triantafyllopoulos
Meishu Song
Ilhan Aslan
Björn W. Schuller
12
14
0
29 Mar 2022
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Junrui Ni
Liming Wang
Heting Gao
Kaizhi Qian
Yang Zhang
Shiyu Chang
M. Hasegawa-Johnson
17
25
0
29 Mar 2022
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation
Rendi Chevi
Radityo Eko Prasojo
Alham Fikri Aji
Andros Tjandra
S. Sakti
VLM
8
3
0
29 Mar 2022
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Sunghwan Ahn
Joun Yeop Lee
N. Kim
47
25
0
29 Mar 2022
VoiceMe: Personalized voice generation in TTS
Pol van Rijn
Silvan Mertes
Dominik Schiller
Piotr Dura
Hubert Siuzdak
Peter M. C. Harrison
Elisabeth André
Nori Jacoby
30
9
0
29 Mar 2022
Neural Vocoder is All You Need for Speech Super-resolution
Haohe Liu
W. Choi
Xubo Liu
Qiuqiang Kong
Qiao Tian
DeLiang Wang
SupR
DRL
33
42
0
28 Mar 2022
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
N. Tomashenko
32
10
0
28 Mar 2022
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
Yuki Saito
Yuto Nishimura
Shinnosuke Takamichi
Kentaro Tachibana
Hiroshi Saruwatari
19
12
0
28 Mar 2022
vTTS: visual-text to speech
Yoshifumi Nakano
Takaaki Saeki
Shinnosuke Takamichi
Katsuhito Sudoh
Hiroshi Saruwatari
18
4
0
28 Mar 2022
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge
Sangjun Park
Kihyun Choo
Joohyung Lee
A. Porov
Konstantin Osipov
June Sig Sung
24
6
0
27 Mar 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
DiffM
41
92
0
25 Mar 2022
HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement
Pavel Andreev
Aibek Alanov
Oleg Ivanov
Dmitry Vetrov
38
38
0
24 Mar 2022
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
Takaaki Saeki
Shinnosuke Takamichi
Tomohiko Nakamura
Naoko Tanji
Hiroshi Saruwatari
33
6
0
24 Mar 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Xintao Zhao
Feng Liu
Changhe Song
Zhiyong Wu
Shiyin Kang
Deyi Tuo
Helen Meng
26
21
0
24 Mar 2022
The VoicePrivacy 2022 Challenge Evaluation Plan
N. Tomashenko
Xin Wang
Xiaoxiao Miao
Hubert Nourtel
Pierre Champion
Massimiliano Todisco
Emmanuel Vincent
Nicholas W. D. Evans
Junichi Yamagishi
J. Bonastre
34
62
0
23 Mar 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
28
12
0
23 Mar 2022
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Rishabh Jain
Mariam Yiwere
Dan Bigioi
Peter Corcoran
H. Cucu
27
14
0
22 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
Gašper Beguš
Alan Zhou
SSL
27
5
0
22 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
16
2
0
21 Mar 2022
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
Zewang Zhang
Yibin Zheng
Xinhui Li
Li Lu
26
16
0
21 Mar 2022
AdaVocoder: Adaptive Vocoder for Custom Voice
Xin Yuan
Yongbin Feng
Mingming Ye
Cheng Tuo
Minghang Zhang
22
3
0
18 Mar 2022
DGC-vector: A new speaker embedding for zero-shot voice conversion
Ruitong Xiao
Haitong Zhang
Yue Lin
26
12
0
18 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
28
109
0
14 Mar 2022
Reproducible Subjective Evaluation
Max Morrison
Brian Tang
Gefei Tan
Bryan Pardo
22
6
0
08 Mar 2022
Practical cognitive speech compression
Reza Lotfidereshgi
P. Gournay
35
2
0
08 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
30
29
0
07 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning
Qingyu Xing
Xiaohan Ma
26
0
0
06 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Takuhiro Kaneko
Kou Tanaka
Hirokazu Kameoka
Shogo Seki
33
60
0
04 Mar 2022
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Kevin J. Shih
Rafael Valle
Rohan Badlani
J. F. Santos
Bryan Catanzaro
36
4
0
03 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Haohan Guo
Hui Lu
Xixin Wu
Helen Meng
185
7
0
02 Mar 2022
Real time spectrogram inversion on mobile phone
Oleg Rybakov
Marco Tagliasacchi
Yunpeng Li
Liyang Jiang
Xia Zhang
Fadi Biadsy
28
4
0
01 Mar 2022
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training
Ramon Sanabria
Wei-Ning Hsu
Alexei Baevski
Michael Auli
27
7
0
01 Mar 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Jinglin Liu
Chengxi Li
Yi Ren
Zhiying Zhu
Zhou Zhao
DiffM
35
16
0
27 Feb 2022
Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
N. Tomashenko
66
25
0
26 Feb 2022
Wavebender GAN: An architecture for phonetically meaningful speech manipulation
Gustavo Teodoro Döhler Beck
Ulme Wennberg
Zofia Malisz
G. Henter
AI4CE
32
8
0
22 Feb 2022
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Jianhao Ye
Hongbin Zhou
Zhiba Su
Wendi He
Kaimeng Ren
Lin Li
Heng Lu
31
4
0
22 Feb 2022
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech
Bo Zhao
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
DiffM
26
22
0
22 Feb 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing
Tao Wang
Jiangyan Yi
Ruibo Fu
J. Tao
Zhengqi Wen
KELM
27
18
0
21 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech
Yi Ren
Ming Lei
Zhiying Huang
Shi-Rui Zhang
Qian Chen
Zhijie Yan
Zhou Zhao
40
41
0
16 Feb 2022
textless-lib: a Library for Textless Spoken Language Processing
Eugene Kharitonov
Jade Copet
Kushal Lakhotia
Tu Nguyen
Paden Tomasello
...
A. Elkahky
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
33
32
0
15 Feb 2022
SpeechPainter: Text-conditioned Speech Inpainting
Zalan Borsos
Matthew Sharifi
Marco Tagliasacchi
16
26
0
15 Feb 2022
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
21
56
0
14 Feb 2022
Deep Performer: Score-to-Audio Music Performance Synthesis
Hao-Wen Dong
Cong Zhou
Taylor Berg-Kirkpatrick
Julian McAuley
27
17
0
12 Feb 2022
Conditional Diffusion Probabilistic Model for Speech Enhancement
Yen-Ju Lu
Zhongqiu Wang
Shinji Watanabe
Alexander Richard
Cheng Yu
Yu Tsao
DiffM
31
178
0
10 Feb 2022
InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training
Zehua Chen
Xu Tan
Ke Wang
Shifeng Pan
Danilo Mandic
Lei He
Sheng Zhao
DiffM
33
28
0
08 Feb 2022
Previous
1
2
3
...
19
20
21
22
23
Next