v1v2 (latest)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,154 papers shown

Title
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech Yusuke Nakai Yuki Saito K. Udagawa Hiroshi Saruwatari AAML 95 1 0 26 Sep 2022
NWPU-ASLP System for the VoicePrivacy 2022 Challenge Jixun Yao Qing Wang Li Zhang Pengcheng Guo Yuhao Liang Linfu Xie PICV 80 17 0 24 Sep 2022
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed Mei-Shuo Chen Z. Duan 108 11 0 23 Sep 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS Haohan Guo Fenglong Xie Frank Soong Xixin Wu Helen M. Meng 90 12 0 22 Sep 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Yifan Hu Pengkai Yin Rui Liu F. Bao Guanglai Gao 51 5 0 22 Sep 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 87 6 0 22 Sep 2022
Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN Yin-Ping Cho Yu Tsao Hsin-Min Wang Yi-Wen Liu DiffM 103 9 0 21 Sep 2022
MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement Jianrong Wang Xiaomin Li Xuewei Li Mei Yu Qiang Fang Li Liu 72 0 0 15 Sep 2022
Open Challenges in Synthetic Speech Detection Luca Cuccovillo Christoforos Papastergiopoulos Anastasios Vafeiadis Artem Yaroshchuk P. Aichroth K. Votis Dimitrios Tzovaras 94 29 0 15 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS Liumeng Xue Frank Soong Shaofei Zhang Linfu Xie 77 23 0 14 Sep 2022
Deep Speech Synthesis from Articulatory Representations Peter Wu Shinji Watanabe Louis Goldstein A. Black Gopala K. Anumanchipalli 78 26 0 13 Sep 2022
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion Ruibin Yuan Yuxuan Wu Jacob Li Jaxter Kim 120 5 0 09 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation Zalan Borsos Raphaël Marinier Damien Vincent Eugene Kharitonov Olivier Pietquin ... Dominik Roblek O. Teboul David Grangier Marco Tagliasacchi Neil Zeghidour AuLLM 200 617 0 07 Sep 2022
Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation Yen-Tung Yeh Bo-Yu Chen Yi-Hsuan Yang 88 6 0 05 Sep 2022
Mel Spectrogram Inversion with Stable Pitch Bruno Di Giorgi M. Levy Richard Sharp 97 6 0 26 Aug 2022
Music Separation Enhancement with Generative Modeling N. Schaffer Boaz Cogan Ethan Manilow Max Morrison Prem Seetharaman Bryan Pardo 73 9 0 26 Aug 2022
Are disentangled representations all you need to build speaker anonymization systems? Pierre Champion D. Jouvet Anthony Larcher 115 20 0 22 Aug 2022
An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio Xin Yan Jiangyan Yi J. Tao Chenglong Wang Haoxin Ma Tao Wang Shiming Wang Ruibo Fu 78 34 0 20 Aug 2022
Pathway to Future Symbiotic Creativity Yi-Ting Guo Qi-fei Liu Jie Chen Wei Xue Jie Fu ... Fernando Rosas Jeffrey Shaw Xing Wu Jiji Zhang Jianliang Xu 75 0 0 18 Aug 2022
Musika! Fast Infinite Waveform Music Generation Marco Pasini Jan Schluter MGen 55 31 0 18 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset Xiang Li Changhe Song X. Wei Zhiyong Wu Jia Jia Helen Meng 64 4 0 10 Aug 2022
DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation Da-Yi Wu Wen-Yi Hsiao Fu-Rong Yang Oscar D. Friedman Warren Jackson Scott Bruzenak Yi-Wen Liu Yi-Hsuan Yang DiffM 117 24 0 09 Aug 2022
BSDGAN: Balancing Sensor Data Generative Adversarial Networks for Human Activity Recognition Yifan Hu Yu Wang 43 7 0 07 Aug 2022
Customs Import Declaration Datasets Chae-Seong Jeong Sundong Kim Jaewoo Park Yeonsoo Choi 77 3 0 04 Aug 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation Dongchao Yang Jianwei Yu Helin Wang Wen Wang Chao Weng Yuexian Zou Dong Yu DiffM 111 306 0 20 Jul 2022
Latent-Domain Predictive Neural Speech Coding Xue Jiang Xiulian Peng Huaying Xue Yuan Zhang Yan Lu 96 18 0 18 Jul 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Rongjie Huang Zhou Zhao Huadai Liu Jinglin Liu Chenye Cui Yi Ren DiffM 139 201 0 13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 92 10 0 13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate Nabarun Goswami Tatsuya Harada 80 5 0 13 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin Younggun Lee Suhee Jo Yeongtae Hwang Taesu Kim 100 14 0 13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection Haoxin Ma Jiangyan Yi Chenglong Wang Xin Yan J. Tao Tao Wang Shiming Wang Ruibo Fu 95 30 0 12 Jul 2022
End-to-end speech recognition modeling from de-identified data M. Flechl Shou-Chun Yin Junho Park Peter Skala 49 5 0 12 Jul 2022
PoeticTTS -- Controllable Poetry Reading for Literary Studies Julia Koch Florian Lux Nadja Schauffler T. Bernhart Felix Dieterle Jonas Kuhn Sandra Richter Gabriel Viehhauser Ngoc Thang Vu 66 5 0 11 Jul 2022
Speaker Anonymization with Phonetic Intermediate Representations Sarina Meyer Florian Lux Pavel Denisov Julia Koch Pascal Tilli Ngoc Thang Vu 88 28 0 11 Jul 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders Yanqing Liu Rui Xue Lei He Xu Tan Sheng Zhao 97 25 0 11 Jul 2022
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion Wen-Chin Huang Shu-Wen Yang Tomoki Hayashi Tomoki Toda 68 17 0 10 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqiang Wang Zhou Zhao 95 10 0 08 Jul 2022
End-to-End Binaural Speech Synthesis Wen-Chin Huang Dejan Marković Alexander Richard I. D. Gebru Anjali Menon 65 9 0 08 Jul 2022
Ultra-Low-Bitrate Speech Coding with Pretrained Transformers Ali Siahkoohi Michael Chinen Tom Denton W. Kleijn Jan Skoglund 58 9 0 05 Jul 2022
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training Zewang Zhang Yibin Zheng Xinhui Li Li Lu DiffM 175 11 0 05 Jul 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion Yinjiao Lei Shan Yang Jian Cong Linfu Xie Jane Polak Scowcroft DiffM 102 12 0 05 Jul 2022
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers Liumeng Xue Shan Yang Na Hu Jane Polak Scowcroft Linfu Xie 63 2 0 02 Jul 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS Kyle Kastner Aaron Courville 59 0 0 30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Weinan Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 110 27 0 29 Jun 2022
Data Redaction from Pre-trained GANs Zhifeng Kong Kamalika Chaudhuri 161 16 0 29 Jun 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion Dacheng Yin Chuanxin Tang Yanqing Liu Xiaoqiang Wang Zhiyuan Zhao Yucheng Zhao Zhiwei Xiong Sheng Zhao Chong Luo 88 12 0 28 Jun 2022
Avocodo: Generative Adversarial Network for Artifact-free Vocoder Taejun Bak Junmo Lee Hanbin Bae Jinhyeok Yang Jaesung Bae Young-Sun Joo 109 28 0 27 Jun 2022
Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection Piotr Kawa Marcin Plata P. Syga AAML 95 23 0 27 Jun 2022
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech Florian Lux Julia Koch Ngoc Thang Vu 77 20 0 24 Jun 2022
A Study on the Evaluation of Generative Models Eyal Betzalel Coby Penso Aviv Navon Ethan Fetaya EGVM 125 52 0 22 Jun 2022