WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 October 2018

Papers citing "WaveGlow: A Flow-based Generative Network for Speech Synthesis"

50 / 525 papers shown

Title
Combined Generative and Predictive Modeling for Speech Super-resolution Heming Wang Eric W. Healy DeLiang Wang DiffM 33 0 0 25 Jan 2024
Contractive Diffusion Probabilistic Models Wenpin Tang Hanyang Zhao DiffM 49 12 0 23 Jan 2024
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis Prabhav Agrawal Thilo Köhler Zhiping Xiu Prashant Serai Qing He 26 1 0 19 Jan 2024
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder Tan Dat Nguyen Ji-Hoon Kim Youngjoon Jang Jaehun Kim Joon Son Chung DiffM 44 5 0 18 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment Hyoung-Seok Oh Sang-Hoon Lee Deok-Hyun Cho Seong-Whan Lee 52 1 0 16 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering Ya-Zhen Song Zhuo Chen Xiaofei Wang Ziyang Ma Xie Chen AuLLM 21 36 0 14 Jan 2024
Incremental FastPitch: Chunk-based High Quality Text to Speech Muyang Du Chuan Liu Junjie Lai 23 0 0 03 Jan 2024
Creating New Voices using Normalizing Flows Piotr Bilinski Thomas Merritt Abdelhamid Ezzerg Kamil Pokora Sebastian Cygert K. Yanagisawa Roberto Barra-Chicote Daniel Korzekwa 26 17 0 22 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Xueyao Zhang Liumeng Xue Yicheng Gu Yuancheng Wang Haorui He ... Mingxuan Wang Jun Han Kai Chen Haizhou Li Zhizheng Wu 29 28 0 15 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning Raviraj Joshi Nikesh Garera 33 0 0 02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints Raviraj Joshi Nikesh Garera 27 0 0 02 Dec 2023
THInImg: Cross-modal Steganography for Presenting Talking Heads in Images Lin Zhao Hongxuan Li Xuefei Ning Xinru Jiang 35 1 0 28 Nov 2023
Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder Yicheng Gu Xueyao Zhang Liumeng Xue Zhizheng Wu 29 11 0 25 Nov 2023
A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness Mathias Vogel DiffM 45 0 0 17 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores Daniel Y. Fu Hermann Kumbong Eric N. D. Nguyen Christopher Ré VLM 41 29 0 10 Nov 2023
Synthetic Speaking Children -- Why We Need Them and How to Make Them Muhammad Ali Farooq Dan Bigioi Rishabh Jain Wang Yao Mariam Yiwere Peter Corcoran 27 0 0 08 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Rishabh Jain Peter Corcoran 28 0 0 07 Nov 2023
AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection Sahibzada Adil Shahzad Ammarah Hashmi Yan-Tsung Peng Yu Tsao Hsin-Min Wang 34 5 0 05 Nov 2023
Flexible Tails for Normalising Flows, with Application to the Modelling of Financial Return Data Tennessee Hickling Dennis Prangle 24 4 0 01 Nov 2023
Enabling Acoustic Audience Feedback in Large Virtual Events Tamay Aykut M. Hofbauer Christopher B. Kuhn Eckehard Steinbach Bernd Girod 55 0 0 27 Oct 2023
Generative Pre-training for Speech with Flow Matching Alexander H. Liu Matt Le Apoorv Vyas Bowen Shi Andros Tjandra Wei-Ning Hsu 27 31 0 25 Oct 2023
An overview of text-to-speech systems and media applications Mohammad Reza Hasanabadi 13 3 0 22 Oct 2023
Energy-Based Models For Speech Synthesis Wanli Sun Zehai Tu Anton Ragni DiffM 26 0 0 19 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokens Robin Algayres Yossi Adi Tu Nguyen Jade Copet Gabriel Synnaeve Benoît Sagot Emmanuel Dupoux AuLLM 46 12 0 08 Oct 2023
Unified speech and gesture synthesis using flow matching Shivam Mehta Ruibo Tu Simon Alexanderson Jonas Beskow Éva Székely G. Henter 45 3 0 08 Oct 2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023 Yi-Hua Zhou Meng Chen Yi Lei Jihua Zhu Weifeng Zhao 21 5 0 08 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset Ze Liu 24 0 0 08 Oct 2023
VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model Yayun He Zuheng Kang Jianzong Wang Junqing Peng Jing Xiao DiffM 19 2 0 07 Oct 2023
Towards human-like spoken dialogue generation between AI agents from written dialogue Kentaro Mitsui Yukiya Hono Kei Sawada 31 13 0 02 Oct 2023
Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer Peter Ochieng DiffM 30 0 0 18 Sep 2023
Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end? Xin Wang Junichi Yamagishi SyDa 58 23 0 12 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching Shivam Mehta Ruibo Tu Jonas Beskow Éva Székely G. Henter 24 72 0 06 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network Takashi Shibuya Yuhta Takida Yuki Mitsufuji 18 11 0 06 Sep 2023
Generative-based Fusion Mechanism for Multi-Modal Tracking Zhangyong Tang Tianyang Xu Xuefeng Zhu Xiaojun Wu Josef Kittler DiffM 26 31 0 04 Sep 2023
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis B. Hayes Jordie Shier Gyorgy Fazekas Andrew Mcpherson C. Saitis 27 21 0 29 Aug 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos Ji-Hoon Kim Jaehun Kim Joon Son Chung 32 5 0 29 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook S. Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Yi Ren ... Wenwu Wang Xulong Zhang Roberto Togneri Min Zhang Björn W. Schuller LM&MA AuLLM 35 38 0 24 Aug 2023
WavMark: Watermarking for Audio Generation Guang Chen Yu-Huan Wu Shujie Liu Tao Liu Xiaoyong Du Furu Wei 25 33 0 24 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Shogo Seki 33 4 0 14 Aug 2023
Image Synthesis under Limited Data: A Survey and Taxonomy Mengping Yang Zhe Wang 28 8 0 31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design Jungil Kong Jihoon Park Beomjeong Kim Jeongmin Kim Dohee Kong Sangjin Kim 37 36 0 31 Jul 2023
Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase Yoshiki Masuyama Natsuki Ueno Nobutaka Ono 14 1 0 23 Jul 2023
PartDiff: Image Super-resolution with Partial Diffusion Models Kai Zhao A. Hung Kai-Lin Pang Haoxin Zheng Kyunghyun Sung DiffM MedIm 25 3 0 21 Jul 2023
Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables Chin-Yun Yu Gyorgy Fazekas 33 7 0 29 Jun 2023
MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning Mohammad Reza Hasanabadi 19 3 0 22 Jun 2023
HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising Li Pang Weizhen Gu Xiangyong Cao Xiangyu Rui Jiangjun Peng Shuang Xu Gang Yang Deyu Meng 17 0 0 20 Jun 2023
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction Wenzhe Liu Yupeng Shi Jun Chen Wei Rao Shulin He Andong Li Yannan Wang Zhiyong Wu 24 6 0 14 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models Ji-Sang Hwang Sang-Hoon Lee Seong-Whan Lee DiffM 38 8 0 12 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN Rithesh Kumar Prem Seetharaman Alejandro Luebs I. Kumar Kundan Kumar 56 288 0 11 Jun 2023
The Age of Synthetic Realities: Challenges and Opportunities J. P. Cardenuto Jing Yang Rafael Padilha Renjie Wan Daniel Moreira Haoliang Li Shiqi Wang Fernanda A. Andaló Sébastien Marcel Anderson de Rezende Rocha DeLMO 42 29 0 09 Jun 2023