v1v2 (latest)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

12 October 2020

Papers citing "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"

50 / 1,154 papers shown

Title
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping Yuma Koizumi Heiga Zen Kohei Yatabe Nanxin Chen M. Bacchiani DiffM 103 49 0 31 Mar 2022
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification Saurabh Kataria Jesús Villalba Laureano Moro-Velazquez Najim Dehak 46 3 0 30 Mar 2022
Generative Spoken Dialogue Language Modeling Tu Nguyen Eugene Kharitonov Jade Copet Yossi Adi Wei-Ning Hsu ... Paden Tomasello Robin Algayres Benoît Sagot Abdel-rahman Mohamed Emmanuel Dupoux AuLLM 126 88 0 30 Mar 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion Zijiang Yang Xin Jing Andreas Triantafyllopoulos Meishu Song Ilhan Aslan Björn W. Schuller 73 14 0 29 Mar 2022
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition Junrui Ni Liming Wang Heting Gao Kaizhi Qian Yang Zhang Shiyu Chang M. Hasegawa-Johnson 78 25 0 29 Mar 2022
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation Rendi Chevi Radityo Eko Prasojo Alham Fikri Aji Andros Tjandra S. Sakti VLM 60 4 0 29 Mar 2022
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus Minchan Kim Myeonghun Jeong Byoung Jin Choi Sunghwan Ahn Joun Yeop Lee N. Kim 113 26 0 29 Mar 2022
VoiceMe: Personalized voice generation in TTS Pol van Rijn Silvan Mertes Dominik Schiller Piotr Dura Hubert Siuzdak Peter M. C. Harrison Elisabeth André Nori Jacoby 64 9 0 29 Mar 2022
Neural Vocoder is All You Need for Speech Super-resolution Haohe Liu W. Choi Xubo Liu Qiuqiang Kong Qiao Tian DeLiang Wang SupR DRL 106 44 0 28 Mar 2022
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions Xiaoxiao Miao Xin Wang Erica Cooper Junichi Yamagishi N. Tomashenko 70 11 0 28 Mar 2022
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent Yuki Saito Yuto Nishimura Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari 128 12 0 28 Mar 2022
vTTS: visual-text to speech Yoshifumi Nakano Takaaki Saeki Shinnosuke Takamichi Katsuhito Sudoh Hiroshi Saruwatari 61 4 0 28 Mar 2022
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge Sangjun Park Kihyun Choo Joohyung Lee A. Porov Konstantin Osipov June Sig Sung 72 6 0 27 Mar 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis Max W. Y. Lam Jun Wang Jane Polak Scowcroft Dong Yu DiffM 105 97 0 25 Mar 2022
HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement Pavel Andreev Aibek Alanov Oleg Ivanov Dmitry Vetrov 104 43 0 24 Mar 2022
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling Takaaki Saeki Shinnosuke Takamichi Tomohiko Nakamura Naoko Tanji Hiroshi Saruwatari 84 6 0 24 Mar 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion Xintao Zhao Feng Liu Changhe Song Zhiyong Wu Shiyin Kang Deyi Tuo Helen Meng 85 21 0 24 Mar 2022
The VoicePrivacy 2022 Challenge Evaluation Plan N. Tomashenko Xin Wang Xiaoxiao Miao Hubert Nourtel Pierre Champion Massimiliano Todisco Emmanuel Vincent Nicholas W. D. Evans Junichi Yamagishi J. Bonastre 117 63 0 23 Mar 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Shiyin Kang Helen Meng 65 12 0 23 Mar 2022
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis Rishabh Jain Mariam Yiwere Dan Bigioi Peter Corcoran H. Cucu 77 14 0 22 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data Gašper Beguš Alan Zhou SSL 127 5 0 22 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling Bac Nguyen Fabien Cardinaux Stefan Uhlich 34 2 0 21 Mar 2022
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses Zewang Zhang Yibin Zheng Xinhui Li Li Lu 131 17 0 21 Mar 2022
AdaVocoder: Adaptive Vocoder for Custom Voice Xin Yuan Yongbin Feng Mingming Ye Cheng Tuo Minghang Zhang 133 3 0 18 Mar 2022
DGC-vector: A new speaker embedding for zero-shot voice conversion Ruitong Xiao Haitong Zhang Yue Lin 62 12 0 18 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities Hsiang-Sheng Tsai Heng-Jui Chang Wen-Chin Huang Zili Huang Kushal Lakhotia ... Hsuan-Jui Chen Shang-Wen Li Shinji Watanabe Abdel-rahman Mohamed Hung-yi Lee 93 110 0 14 Mar 2022
Reproducible Subjective Evaluation Max Morrison Brian Tang Gefei Tan Bryan Pardo 65 7 0 08 Mar 2022
Practical cognitive speech compression Reza Lotfidereshgi P. Gournay 59 2 0 08 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features Florian Lux Ngoc Thang Vu 102 29 0 07 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning Qingyu Xing Xiaohan Ma 138 0 0 06 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Takuhiro Kaneko Kou Tanaka Hirokazu Kameoka Shogo Seki 89 62 0 04 Mar 2022
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows Kevin J. Shih Rafael Valle Rohan Badlani J. F. Santos Bryan Catanzaro 60 4 0 03 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS Haohan Guo Hui Lu Xixin Wu Helen Meng 360 7 0 02 Mar 2022
Real time spectrogram inversion on mobile phone Oleg Rybakov Marco Tagliasacchi Yunpeng Li Liyang Jiang Xia Zhang Fadi Biadsy 146 4 0 01 Mar 2022
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training Ramon Sanabria Wei-Ning Hsu Alexei Baevski Michael Auli 71 7 0 01 Mar 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier Jinglin Liu Chengxi Li Yi Ren Zhiying Zhu Zhou Zhao DiffM 96 17 0 27 Feb 2022
Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models Xiaoxiao Miao Xin Wang Erica Cooper Junichi Yamagishi N. Tomashenko 185 25 0 26 Feb 2022
Wavebender GAN: An architecture for phonetically meaningful speech manipulation Gustavo Teodoro Döhler Beck Ulme Wennberg Zofia Malisz G. Henter AI4CE 94 8 0 22 Feb 2022
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme Jianhao Ye Hongbin Zhou Zhiba Su Wendi He Kaimeng Ren Lin Li Heng Lu 55 4 0 22 Feb 2022
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech Bo Zhao Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao DiffM 96 22 0 22 Feb 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing Tao Wang Jiangyan Yi Ruibo Fu J. Tao Zhengqi Wen KELM 86 20 0 21 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech Yi Ren Ming Lei Zhiying Huang Shi-Rui Zhang Qian Chen Zhijie Yan Zhou Zhao 96 43 0 16 Feb 2022
textless-lib: a Library for Textless Spoken Language Processing Eugene Kharitonov Jade Copet Kushal Lakhotia Tu Nguyen Paden Tomasello ... A. Elkahky Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux Yossi Adi 134 34 0 15 Feb 2022
SpeechPainter: Text-conditioned Speech Inpainting Zalan Borsos Matthew Sharifi Marco Tagliasacchi 98 28 0 15 Feb 2022
Visual Acoustic Matching Changan Chen Ruohan Gao P. Calamia Kristen Grauman 81 58 0 14 Feb 2022
Deep Performer: Score-to-Audio Music Performance Synthesis Hao-Wen Dong Cong Zhou Taylor Berg-Kirkpatrick Julian McAuley 85 17 0 12 Feb 2022
Conditional Diffusion Probabilistic Model for Speech Enhancement Yen-Ju Lu Zhongqiu Wang Shinji Watanabe Alexander Richard Cheng Yu Yu Tsao DiffM 87 191 0 10 Feb 2022
InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training Zehua Chen Xu Tan Ke Wang Shifeng Pan Danilo Mandic Lei He Sheng Zhao DiffM 82 31 0 08 Feb 2022
PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech Srikanth Korse N. Pia Kishan Gupta Guillaume Fuchs 99 15 0 31 Jan 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge Ziyi Chen Hua Hua Yuxiang Zhang Ming Li Pengyuan Zhang 110 0 0 29 Jan 2022