WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 October 2018

Papers citing "WaveGlow: A Flow-based Generative Network for Speech Synthesis"

50 / 525 papers shown

Title
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech D. Lim Sunghee Jung Eesung Kim 19 51 0 31 Mar 2022
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping Yuma Koizumi Heiga Zen Kohei Yatabe Nanxin Chen M. Bacchiani DiffM 33 45 0 31 Mar 2022
Forensic Analysis and Localization of Multiply Compressed MP3 Audio Using Transformers Ziyue Xiang Paolo Bestagini Stefano Tubaro Edward J. Delp 28 10 0 30 Mar 2022
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation Rendi Chevi Radityo Eko Prasojo Alham Fikri Aji Andros Tjandra S. Sakti VLM 8 3 0 29 Mar 2022
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge Sangjun Park Kihyun Choo Joohyung Lee A. Porov Konstantin Osipov June Sig Sung 14 6 0 27 Mar 2022
A Neural Vocoder Based Packet Loss Concealment Algorithm Yaofeng Zhou C. Bao 23 2 0 26 Mar 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis Max W. Y. Lam Jun Wang Dan Su Dong Yu DiffM 36 92 0 25 Mar 2022
HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement Pavel Andreev Aibek Alanov Oleg Ivanov Dmitry Vetrov 38 38 0 24 Mar 2022
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis Rishabh Jain Mariam Yiwere Dan Bigioi Peter Corcoran H. Cucu 24 14 0 22 Mar 2022
TO-FLOW: Efficient Continuous Normalizing Flows with Temporal Optimization adjoint with Moving Speed Shian Du Yihong Luo Wei Chen Jian Xu Delu Zeng 32 7 0 19 Mar 2022
A $^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing Richard He Bai Renjie Zheng Junkun Chen Xintong Li Mingbo Ma Liang Huang 24 49 0 18 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising flows Thomas Merritt Abdelhamid Ezzerg Piotr Bilinski Magdalena Proszewska Kamil Pokora Roberto Barra-Chicote Daniel Korzekwa 36 14 0 15 Mar 2022
PD-Flow: A Point Cloud Denoising Framework with Normalizing Flows Aihua Mao Zihui Du Yu-Hui Wen Jun-ying Xuan Yong-Jin Liu 27 28 0 11 Mar 2022
Practical cognitive speech compression Reza Lotfidereshgi P. Gournay 32 2 0 08 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features Florian Lux Ngoc Thang Vu 25 29 0 07 Mar 2022
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation Tao Wang Ruibo Fu Jiangyan Yi J. Tao Zhengqi Wen 9 2 0 05 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Takuhiro Kaneko Kou Tanaka Hirokazu Kameoka Shogo Seki 25 60 0 04 Mar 2022
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values Ahmed Imtiaz Humayun Randall Balestriero Richard Baraniuk 24 31 0 03 Mar 2022
Real time spectrogram inversion on mobile phone Oleg Rybakov Marco Tagliasacchi Yunpeng Li Liyang Jiang Xia Zhang Fadi Biadsy 21 4 0 01 Mar 2022
End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation Krishna Subramani J. Valin Umut Isik Paris Smaragdis A. Krishnaswamy 29 11 0 23 Feb 2022
Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet J. Valin Umut Isik Paris Smaragdis A. Krishnaswamy 29 4 0 22 Feb 2022
Wavebender GAN: An architecture for phonetically meaningful speech manipulation Gustavo Teodoro Döhler Beck Ulme Wennberg Zofia Malisz G. Henter AI4CE 27 8 0 22 Feb 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing Tao Wang Jiangyan Yi Ruibo Fu J. Tao Zhengqi Wen KELM 27 18 0 21 Feb 2022
It's Raw! Audio Generation with State-Space Models Karan Goel Albert Gu Chris Donahue Christopher Ré 16 186 0 20 Feb 2022
textless-lib: a Library for Textless Spoken Language Processing Eugene Kharitonov Jade Copet Kushal Lakhotia Tu Nguyen Paden Tomasello ... A. Elkahky Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux Yossi Adi 33 32 0 15 Feb 2022
Visual Acoustic Matching Changan Chen Ruohan Gao P. Calamia Kristen Grauman 21 56 0 14 Feb 2022
Deep Performer: Score-to-Audio Music Performance Synthesis Hao-Wen Dong Cong Zhou Taylor Berg-Kirkpatrick Julian McAuley 27 17 0 12 Feb 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge Ziyi Chen Hua Hua Yuxiang Zhang Ming Li Pengyuan Zhang 27 0 0 29 Jan 2022
ItôWave: Itô Stochastic Differential Equation Is All You Need For Wave Generation Shoule Wu Ziqiang Shi DiffM 280 9 0 29 Jan 2022
Invertible Voice Conversion Zexin Cai Ming Li BDL 27 1 0 26 Jan 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention Artem Gorodetskii Ivan Ozhiganov 25 2 0 25 Jan 2022
Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals Haohan Guo Zhiping Zhou Fanbo Meng Kai-Chun Liu 50 16 0 25 Jan 2022
Hiding Data in Colors: Secure and Lossless Deep Image Steganography via Conditional Invertible Neural Networks Yanzhen Ren Ting Liu Liming Zhai Lina Wang 9 7 0 19 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis Yu Wang Xinsheng Wang Pengcheng Zhu Jie Wu Hanzhao Li Heyang Xue Yongmao Zhang Lei Xie Mengxiao Bi 25 95 0 19 Jan 2022
Audio representations for deep learning in sound synthesis: A review Anastasia Natsiou Seán O'Leary AI4TS 24 18 0 07 Jan 2022
A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram Anastasia Natsiou Seán O'Leary 22 3 0 07 Jan 2022
Solving time dependent Fokker-Planck equations via temporal normalizing flow Xiaodong Feng Li Zeng Tao Zhou AI4CE 36 25 0 28 Dec 2021
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus Rongjie Huang Feiyang Chen Yi Ren Jinglin Liu Chenye Cui Zhou Zhao 33 98 0 20 Dec 2021
Generate Point Clouds with Multiscale Details from Graph-Represented Structures Ximing Yang Zhibo Zhang Zhengfu He Cheng Jin 3DPC 20 1 0 13 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 38 23 0 09 Dec 2021
Multi-speaker Emotional Text-to-speech Synthesizer Sungjae Cho Soo-Young Lee 10 0 0 07 Dec 2021
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 185 379 0 04 Dec 2021
How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey Zahra Khanjani Gabrielle Watson V. P Janeja 25 25 0 28 Nov 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance Heeseung Kim Sungwon Kim Sungroh Yoon DiffM BDL 19 107 0 23 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 19 16 0 19 Nov 2021
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Aimilios Chalamandaris Georgia Maniati Panos Kakoulidis S. Raptis June Sig Sung Hyoungmin Park Pirros Tsiakoulis 11 36 0 17 Nov 2021
CAESynth: Real-Time Timbre Interpolation and Pitch Control with Conditional Autoencoders Aaron Valero Puche Sukhan Lee 21 1 0 09 Nov 2021
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis Antoine Caillon P. Esling DRL 27 109 0 09 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection Joel Frank Lea Schonherr DiffM 129 123 0 04 Nov 2021
Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework Jonas Köhler Maarten C. Ottenhoff Sophocles Goulis Miguel Angrick A. Colon Louis Wagner S. Tousseyn P. Kubben Christian Herff 30 25 0 02 Nov 2021