WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 October 2018

Papers citing "WaveGlow: A Flow-based Generative Network for Speech Synthesis"

50 / 525 papers shown

Title
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion Hao Liu Tao Wang Jie Cao Ran He J. Tao DiffM 11 3 0 09 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality Fabian Kögel Bac Nguyen Fabien Cardinaux 14 2 0 02 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis Hubert Siuzdak 32 79 0 01 Jun 2023
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model A. Iashchenko Pavel Andreev Ivan Shchekotov Nicholas Babaev Dmitry Vetrov DiffM 21 1 0 01 Jun 2023
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks L. Tóth Amin Honarmandi Shandiz G. Gosztolya T. Csapó 24 3 0 30 May 2023
Towards single integrated spoofing-aware speaker verification embeddings Sung Hwan Mun Hye-jin Shim Hemlata Tak Xin Wang Xuechen Liu ... Junichi Yamagishi Nicholas W. D. Evans Tomi Kinnunen N. Kim Jee-weon Jung 46 11 0 30 May 2023
Towards generalizing deep-audio fake detection networks Konstantin Gasenzer Moritz Wolter 36 4 0 22 May 2023
Textually Pretrained Speech Language Models Michael Hassid Tal Remez Tu Nguyen Itai Gat Alexis Conneau ... Alexandre Défossez Gabriel Synnaeve Emmanuel Dupoux Roy Schwartz Yossi Adi VLM SyDa 44 53 0 22 May 2023
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation Zhe Ye Wei Xue Xuejiao Tan Qi-fei Liu Yi-Ting Guo 26 2 0 22 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra Yang Ai Zhenhua Ling 34 13 0 13 May 2023
Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion Yanzhen Ren Hongcheng Zhu Liming Zhai Zongkun Sun Rubing Shen Lina Wang 33 6 0 09 May 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis Ye-Xin Lu Yang Ai Zhenhua Ling 24 1 0 26 Apr 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model Jianzong Wang Xulong Zhang Haobin Tang Aolan Sun Ning Cheng Jing Xiao 26 1 0 23 Apr 2023
Affective social anthropomorphic intelligent system Md. Adyelullahil Mamun Hasnat Md. Abdullah Md. Golam Rabiul Alam Muhammad Mehedi Hassan Md. Zia Uddin 17 1 0 19 Apr 2023
Neural Diffeomorphic Non-uniform B-spline Flows S. Hong S. Chun 37 1 0 07 Apr 2023
AraSpot: Arabic Spoken Command Spotting Mahmoud Salhab H. Harmanani 28 0 0 29 Mar 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Shogo Seki 34 9 0 24 Mar 2023
Transformers in Speech Processing: A Survey S. Latif Aun Zaidi Heriberto Cuayáhuitl Fahad Shamshad Moazzam Shoukat Junaid Qadir 42 47 0 21 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture Hauret Julien Joubaud Thomas V. Zimpfer Bavu Éric 21 6 0 17 Mar 2023
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets Weixin Chen D. Song Bo-wen Li DiffM 34 74 0 10 Mar 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech Kentaro Mitsui Yukiya Hono Kei Sawada CVBM 16 6 0 28 Feb 2023
Conditional deep generative models as surrogates for spatial field solution reconstruction with quantified uncertainty in Structural Health Monitoring applications Nicholas E. Silionis Theodora Liangou K. Anyfantis AI4CE 26 0 0 14 Feb 2023
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages Sudhanshu Srivastava Ishika Gupta Anusha Prakash Jom Kuriakose H. Murthy VLM 21 1 0 13 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS Rohan Badlani Rafael Valle Kevin J. Shih J. F. Santos Francesco Ferroni Bryan Catanzaro 16 6 0 24 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Chengyi Wang Sanyuan Chen Yu-Huan Wu Zi-Hua Zhang Long Zhou ... Huaming Wang Jinyu Li Lei He Sheng Zhao Furu Wei 48 644 0 05 Jan 2023
Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling Amitay Sicherman Yossi Adi 20 32 0 02 Jan 2023
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity Ahmed Mustafa J. Valin Jan Büthe Paris Smaragdis Mike Goodwin 30 4 0 08 Dec 2022
On the Robustness of Normalizing Flows for Inverse Problems in Imaging Seongmin Hong I. Park S. Chun 33 7 0 08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning Ankur Debnath Shridevi S Patil Gangotri Nadiger R. Ganesan 26 20 0 07 Dec 2022
Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech Dominik Wagner Sebastian P. Bayerl H. A. C. Maruri Tobias Bocklet 24 7 0 04 Dec 2022
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices O. Watts Lovisa Wihlborg Cassia Valentini-Botinhao 33 3 0 25 Nov 2022
Efficient Incremental Text-to-Speech on GPUs Muyang Du Chuan Liu Jiaxing Qi Junjie Lai 24 1 0 25 Nov 2022
STGlow: A Flow-based Generative Framework with Dual Graphormer for Pedestrian Trajectory Prediction Rongqin Liang Yuanman Li Jiantao Zhou Xia Li 39 12 0 21 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 38 18 0 17 Nov 2022
Challenges in creative generative models for music: a divergence maximization perspective Axel Chemla-Romeu-Santos P. Esling 18 4 0 16 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 39 12 0 13 Nov 2022
Online Phase Reconstruction via DNN-based Phase Differences Estimation Yoshiki Masuyama Kohei Yatabe Kento Nagatomo Yasuhiro Oikawa 3DV 16 7 0 12 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP Kun Song Yongmao Zhang Yinjiao Lei Jian Cong Hanzhao Li Linfu Xie Gang He Jinfeng Bai 61 15 0 02 Nov 2022
SIMD-size aware weight regularization for fast neural vocoding on CPU Hiroki Kanagawa Yusuke Ijima 16 0 0 02 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS Kun Song Jian Cong Xinsheng Wang Yongmao Zhang Linfu Xie Ning Jiang Haiying Wu 27 0 0 31 Oct 2022
Audio Time-Scale Modification with Temporal Compressing Networks Ernie Chu Ju-Ting Chen Chia-Ping Chen 25 0 0 31 Oct 2022
Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution Chin-Yun Yu Sung-Lin Yeh Gyorgy Fazekas Hao Tang DiffM 40 20 0 27 Oct 2022
Cover Reproducible Steganography via Deep Generative Models Kejiang Chen Hang Zhou Yaofei Wang Meng Li Weiming Zhang Neng H. Yu DiffM 31 9 0 26 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS Ziqi Liang 36 0 0 24 Oct 2022
Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation Martin Strauss Matteo Torcoli B. Edler 21 4 0 21 Oct 2022
Robust One-Shot Singing Voice Conversion Naoya Takahashi M. Singh Yuki Mitsufuji DiffM 25 8 0 20 Oct 2022
Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders Xin Wang Junichi Yamagishi 26 36 0 19 Oct 2022
Invertible Monotone Operators for Normalizing Flows Byeongkeun Ahn Chiyoon Kim Youngjoon Hong Hyunwoo J. Kim TPM 43 8 0 15 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder Naoya Takahashi Mayank Kumar Singh Yuki Mitsufuji DiffM 21 16 0 14 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection Piotr Kawa Marcin Plata P. Syga 37 14 0 12 Oct 2022