Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,107 papers shown
Title
Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
43
6
0
13 Apr 2023
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Jiatong Shi
Yun Tang
Ann Lee
Hirofumi Inaguma
Changhan Wang
J. Pino
Shinji Watanabe
51
9
0
10 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
Hirofumi Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
24
20
0
10 Apr 2023
ArmanTTS single-speaker Persian dataset
Mohammd Hasan Shamgholi
Vahid Saeedi
J. Peymanfard
Leila Alhabib
Hossein Zeinali
24
2
0
07 Apr 2023
On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection
Yi Zhu
Mohamed Imoussaïne-Aïkous
Carolyn Côté-Lussier
Tiago H. Falk
20
4
0
05 Apr 2023
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
Yuancheng Wang
Zeqian Ju
Xuejiao Tan
Lei He
Zhizheng Wu
Jiang Bian
Sheng Zhao
DiffM
19
47
0
03 Apr 2023
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Jiawei Liu
Weining Wang
Sihan Chen
Xinxin Zhu
Jiaheng Liu
DiffM
VGen
28
13
0
29 Mar 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
34
9
0
24 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
21
6
0
17 Mar 2023
PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor
Yuning Wu
Jiatong Shi
Tao Qian
Dongji Gao
Qin Jin
33
5
0
15 Mar 2023
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Suhee Jo
Younggun Lee
Yookyung Shin
Yeongtae Hwang
Taesu Kim
15
3
0
15 Mar 2023
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis
Haobin Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
22
14
0
14 Mar 2023
Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge
Mingshuai Liu
Shubo Lv
Zihan Zhang
Ru Han
Xiang Hao
Xianjun Xia
Li Chen
Yijian Xiao
Linfu Xie
21
6
0
14 Mar 2023
An End-to-End Neural Network for Image-to-Audio Transformation
Liu Chen
Michael Deisher
Munir Georges
26
3
0
10 Mar 2023
Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports
Hyunseung Chung
Jiho Kim
Joon-Myoung Kwon
K. Jeon
Min Sung Lee
Edward Choi
MedIm
11
15
0
09 Mar 2023
Do Prosody Transfer Models Transfer Prosody?
A. Sigurgeirsson
Simon King
DiffM
17
7
0
07 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
38
173
0
07 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
36
7
0
06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
41
4
0
05 Mar 2023
A General Framework for Learning Procedural Audio Models of Environmental Sounds
Danzel Serrano
M. Cartwright
DiffM
DRL
35
1
0
04 Mar 2023
Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints
P. Magron
Tuomas Virtanen
32
0
0
03 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model
Haolin Chen
Philip N. Garner
DiffM
39
1
0
03 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
39
22
0
03 Mar 2023
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
Jun Rekimoto
48
19
0
03 Mar 2023
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
Yingting Li
Ambuj Mehrish
Shuaijiang Zhao
Rishabh Bhardwaj
Amir Zadeh
Navonil Majumder
Rada Mihalcea
Soujanya Poria
AAML
29
16
0
02 Mar 2023
Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition
P. Klumpp
Pooja Chitkara
Leda Sari
Prashant Serai
Jilong Wu
Irina-Elena Veliche
Rongqing Huang
Qing He
35
2
0
01 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
25
8
0
01 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Ajinkya Kulkarni
Atharva Kulkarni
Sara Shatnawi
Hanan Aldarmaki
27
8
0
28 Feb 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
18
6
0
28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
45
27
0
27 Feb 2023
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Ke Wang
Tomoki Koriyama
Yuki Saito
Takaaki Saeki
Detai Xin
Hiroshi Saruwatari
28
7
0
27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
28
6
0
27 Feb 2023
Contrast-PLC: Contrastive Learning for Packet Loss Concealment
Huaying Xue
Xiulian Peng
Yan Lu
54
4
0
26 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
22
3
0
24 Feb 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Houjian Guo
Chaoran Liu
C. Ishi
H. Ishiguro
BDL
27
12
0
16 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
13
21
0
16 Feb 2023
Speaker-Independent Acoustic-to-Articulatory Speech Inversion
Peter Wu
Li-Wei Chen
Cheol Jun Cho
Shinji Watanabe
Louis Goldstein
A. Black
Gopala K. Anumanchipalli
18
26
0
14 Feb 2023
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages
Sudhanshu Srivastava
Ishika Gupta
Anusha Prakash
Jom Kuriakose
H. Murthy
VLM
21
1
0
13 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
19
35
0
08 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Eugene Kharitonov
Damien Vincent
Zalan Borsos
Raphaël Marinier
Sertan Girgin
Olivier Pietquin
Matthew Sharifi
Marco Tagliasacchi
Neil Zeghidour
24
191
0
07 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
31
85
0
31 Jan 2023
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer
Yuhta Takida
Masaaki Imaizumi
Takashi Shibuya
Chieh-Hsin Lai
Toshimitsu Uesaka
Naoki Murata
Yuki Mitsufuji
GAN
26
12
0
30 Jan 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
151
318
0
30 Jan 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
37
18
0
30 Jan 2023
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo Mandic
Wenwu Wang
Mark D. Plumbley
DiffM
49
473
0
29 Jan 2023
Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Navjot Kaur
Paige Tuttosi
29
2
0
29 Jan 2023
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
55
2
0
26 Jan 2023
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation
Shahar Lutati
Eliya Nachmani
Lior Wolf
DiffM
40
14
0
25 Jan 2023
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
Wen-Chin Huang
Benjamin Peloquin
Justine T. Kao
Changhan Wang
Hongyu Gong
Elizabeth Salesky
Yossi Adi
Ann Lee
Peng-Jen Chen
25
16
0
25 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Yoshua Bengio
GAN
40
15
0
21 Jan 2023
Previous
1
2
3
...
14
15
16
...
21
22
23
Next