v1v2 (latest)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,154 papers shown

Title
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Chengyi Wang Sanyuan Chen Yu-Huan Wu Zi-Hua Zhang Long Zhou ... Huaming Wang Jinyu Li Lei He Sheng Zhao Furu Wei 205 727 0 05 Jan 2023
Towards Voice Reconstruction from EEG during Imagined Speech Young-Eun Lee Seo-Hyun Lee Sang-Ho Kim Seong-Whan Lee 70 38 0 02 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech Ze Chen Yihan Wu Yichong Leng Jiawei Chen Haohe Liu ... Ke Wang Lei He Sheng Zhao Jiang Bian Danilo Mandic DiffM 110 23 0 30 Dec 2022
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models Yinghao Aaron Li Cong Han N. Mesgarani 85 19 0 29 Dec 2022
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation Jean-Marie Lemercier Julius Richter Simon Welker Timo Gerkmann DiffM 259 92 0 22 Dec 2022
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement Wei-Ning Hsu Tal Remez Bowen Shi Jacob Donley Yossi Adi DiffM 98 12 0 21 Dec 2022
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units Gallil Maimon Yossi Adi 108 14 0 19 Dec 2022
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder Yusuke Yasuda Tomoki Toda DiffM 79 8 0 16 Dec 2022
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units Hirofumi Inaguma Sravya Popuri Ilia Kulikov Peng-Jen Chen Changhan Wang Yu-An Chung Yun Tang Ann Lee Shinji Watanabe J. Pino 121 61 0 15 Dec 2022
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Shinhyeok Oh HyeongRae Noh Yoonseok Hong Insoo Oh 85 0 0 15 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset Kailin Liang Bin Liu Yifan Hu Rui Liu F. Bao Guanglai Gao 79 1 0 11 Dec 2022
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity Ahmed Mustafa J. Valin Jan Büthe Paris Smaragdis Mike Goodwin 56 4 0 08 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models Jinze Bai Rui Men Han Yang Xuancheng Ren Kai Dang ... Wenhang Ge Jianxin Ma Junyang Lin Jingren Zhou Chang Zhou 93 16 0 08 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models Gaoxiang Cong Liang Li Yuankai Qi Zhengjun Zha Qi Wu Wen-yu Wang Bin Jiang Ming-Hsuan Yang Qin Huang 141 27 0 08 Dec 2022
Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling J. Nam Sangwoo Mo Jaeho Lee Jinwoo Shin 103 7 0 05 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis Yinjiao Lei Shan Yang Xinsheng Wang Qicong Xie Jixun Yao Linfu Xie Jane Polak Scowcroft DiffM 82 9 0 03 Dec 2022
Deep neural network techniques for monaural speech enhancement: state of the art analysis P. Ochieng 131 23 0 01 Dec 2022
Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline Paul-Gauthier Noé Xiaoxiao Miao Xin Wang Junichi Yamagishi J. Bonastre D. Matrouf 100 7 0 29 Nov 2022
Neural Vocoder Feature Estimation for Dry Singing Voice Separation Jae-Yeol Im Soonbeom Choi Sangeon Yong Juhan Nam 68 1 0 29 Nov 2022
Contextual Expressive Text-to-Speech Jianhong Tu Zeyu Cui Xiaohuan Zhou Siqi Zheng Kaiqin Hu Ju Fan Chang Zhou 55 3 0 26 Nov 2022
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices O. Watts Lovisa Wihlborg Cassia Valentini-Botinhao 73 3 0 25 Nov 2022
Efficient Incremental Text-to-Speech on GPUs Muyang Du Chuan Liu Jiaxing Qi Junjie Lai 70 1 0 25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems? Xuan Shi Erica Cooper Xin Wang Junichi Yamagishi Shrikanth Narayanan 73 1 0 25 Nov 2022
Voice-preserving Zero-shot Multiple Accent Conversion Mumin Jin Prashant Serai Jilong Wu Andros Tjandra Vimal Manohar Qing He 75 13 0 23 Nov 2022
PromptTTS: Controllable Text-to-Speech with Text Descriptions Zhifang Guo Yichong Leng Yihan Wu Sheng Zhao Xuejiao Tan DiffM 79 107 0 22 Nov 2022
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System Takenori Yoshimura Shinji Takaki Kazuhiro Nakamura Keiichiro Oura Yukiya Hono Kei Hashimoto Yoshihiko Nankaku K. Tokuda 78 7 0 21 Nov 2022
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders Rodrigo Mira Buye Xu Jacob Donley Anurag Kumar Stavros Petridis V. Ithapu Maja Pantic 66 13 0 20 Nov 2022
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement Chenye Cui Yi Ren Jinglin Liu Rongjie Huang Zhou Zhao VGen 95 14 0 19 Nov 2022
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling Xinfa Zhu Yinjiao Lei Kun Song Yongmao Zhang Tao Li Linfu Xie 92 17 0 19 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 101 22 0 17 Nov 2022
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance Yiwei Guo Chenpeng Du Xie Chen K. Yu DiffM 137 44 0 17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 90 46 0 17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models Minki Kang Dong Min Sung Ju Hwang DiffM 116 50 0 17 Nov 2022
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training Yang Xiang Jesper Lisby Højvang M. Rasmussen M. G. Christensen DRL 76 6 0 16 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer Leyuan Qu Wei Wang C. Weber F. Ren Taiha Li S. Wermter 53 1 0 16 Nov 2022
Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder Yuying Xie Thomas Arildsen Zheng-Hua Tan 61 2 0 15 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing J. Webber Cassia Valentini-Botinhao Evelyn Williams G. Henter Simon King 111 9 0 13 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 109 13 0 13 Nov 2022
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units Li-Wei Chen Shinji Watanabe Alexander I. Rudnicky 82 7 0 12 Nov 2022
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations Yoorim Oh Juheon Lee Yoseob Han Kyogu Lee 71 3 0 11 Nov 2022
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning Gaku Narita Junichi Shimizu Taketo Akama GAN 82 11 0 10 Nov 2022
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features Ziqian Ning Qicong Xie Pengcheng Zhu Zhichao Wang Liumeng Xue Jixun Yao Linfu Xie Mengxiao Bi 78 18 0 09 Nov 2022
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping Junhyeok Lee Seungu Han Hyunjae Cho Wonbin Jung 60 12 0 08 Nov 2022
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling Jixun Yao Qing Wang Yi Lei Pengcheng Guo Linfu Xie Namin Wang Jie Liu 68 14 0 06 Nov 2022
Preserving background sound in noise-robust voice conversion via multi-task learning Jixun Yao Yi Lei Qing Wang Pengcheng Guo Ziqian Ning Linfu Xie Hai Li Junhui Liu Danming Xie 68 10 0 06 Nov 2022
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer Yongmao Zhang Heyang Xue Hanzhao Li Linfu Xie Tingwei Guo Ruixiong Zhang Caixia Gong DiffM VLM 101 30 0 05 Nov 2022
Self-Supervised Learning for Speech Enhancement through Synthesis Bryce Irvin Marko Stamenovic M. Kegler Li-Chia Yang 100 21 0 04 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS Dongchao Yang Songxiang Liu Jianwei Yu Helin Wang Chao Weng Yuexian Zou DiffM VLM 95 18 0 04 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP Kun Song Yongmao Zhang Yinjiao Lei Jian Cong Hanzhao Li Linfu Xie Gang He Jinfeng Bai 111 15 0 02 Nov 2022
SIMD-size aware weight regularization for fast neural vocoding on CPU Hiroki Kanagawa Yusuke Ijima 120 0 0 02 Nov 2022