WaveGlow: A Flow-based Generative Network for Speech Synthesis

31 October 2018

Papers citing "WaveGlow: A Flow-based Generative Network for Speech Synthesis"

50 / 525 papers shown

Title
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos Björn W. Schuller Gokcce .Iymen M. Sezgin Xiangheng He ... Shuo Liu Silvan Mertes Elisabeth André Ruibo Fu Jianhua Tao 20 53 0 06 Oct 2022
How Image Generation Helps Visible-to-Infrared Person Re-Identification? Honghu Pan Yongyong Chen Yunqing He Xin Li Zhenyu He 15 2 0 04 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration Yuma Koizumi Kohei Yatabe Heiga Zen M. Bacchiani DiffM 42 29 0 03 Oct 2022
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling Itai Gat Felix Kreuk Tu Nguyen Ann Lee Jade Copet Gabriel Synnaeve Emmanuel Dupoux Yossi Adi 51 11 0 30 Sep 2022
AutoLV: Automatic Lecture Video Generator Wen Wang Yang Song Sanjay Jha VGen 21 3 0 19 Sep 2022
Open Challenges in Synthetic Speech Detection Luca Cuccovillo Christoforos Papastergiopoulos Anastasios Vafeiadis Artem Yaroshchuk P. Aichroth K. Votis Dimitrios Tzovaras 46 27 0 15 Sep 2022
ConvNeXt Based Neural Network for Audio Anti-Spoofing Qiaowei Ma J. Zhong Yitao Yang Weiheng Liu Yingbo Gao W. W. Ng AAML 44 6 0 14 Sep 2022
Deep Speech Synthesis from Articulatory Representations Peter Wu Shinji Watanabe Louis Goldstein A. Black Gopala K. Anumanchipalli 39 24 0 13 Sep 2022
Evaluating generative audio systems and their metrics Ashvala Vinay Alexander Lerch 24 19 0 31 Aug 2022
Maximum Likelihood on the Joint (Data, Condition) Distribution for Solving Ill-Posed Problems with Conditional Flow Models John Shelton Hyatt 12 1 0 24 Aug 2022
Pathway to Future Symbiotic Creativity Yi-Ting Guo Qi-fei Liu Jie Chen Wei Xue Jie Fu ... Fernando Rosas Jeffrey Shaw Xing Wu Jiji Zhang Jianliang Xu 34 0 0 18 Aug 2022
Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0 M. S. Al-Radhi Tamás Gábor Csapó Csaba Zainkó Géza Németh 14 1 0 15 Aug 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 36 10 0 13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate Nabarun Goswami Tatsuya Harada 26 5 0 13 Jul 2022
A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System Yi-Chiao Wu Patrick Lumban Tobing Kazuki Yasuhara Noriyuki Matsunaga Yamato Ohtani T. Toda 42 0 0 13 Jul 2022
End-to-end speech recognition modeling from de-identified data M. Flechl Shou-Chun Yin Junho Park Peter Skala 17 4 0 12 Jul 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion Yinjiao Lei Shan Yang Jian Cong Linfu Xie Dan Su DiffM 52 12 0 05 Jul 2022
Avocodo: Generative Adversarial Network for Artifact-free Vocoder Taejun Bak Junmo Lee Hanbin Bae Jinhyeok Yang Jaesung Bae Young-Sun Joo 25 28 0 27 Jun 2022
Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection Piotr Kawa Marcin Plata P. Syga AAML 49 23 0 27 Jun 2022
Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks Amin Honarmandi Shandiz L. Tóth 16 4 0 26 Jun 2022
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms Marco Jiralerspong Gauthier Gidel VLM 27 3 0 25 Jun 2022
WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis Yi Wang Yi Si 25 0 0 20 Jun 2022
NatiQ: An End-to-end Text-to-Speech System for Arabic Ahmed Abdelali Nadir Durrani C. Demiroğlu Fahim Dalvi Hamdy Mubarak Kareem Darwish 18 14 0 15 Jun 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks Shanghua Gao Zhong-Yu Li Qi Han Ming-Ming Cheng Liang Wang 32 34 0 14 Jun 2022
Multi-instrument Music Synthesis with Spectrogram Diffusion Curtis Hawthorne Ian Simon Adam Roberts Neil Zeghidour Josh Gardner Ethan Manilow Jesse Engel DiffM 21 49 0 11 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training Sang-gil Lee Ming-Yu Liu Boris Ginsburg Bryan Catanzaro Sung-Hoon Yoon 22 228 0 09 Jun 2022
Patch-based Object-centric Transformers for Efficient Video Generation Wilson Yan Ryogo Okumura Stephen James Pieter Abbeel DiffM ViT 31 6 0 08 Jun 2022
FlexLip: A Controllable Text-to-Lip System Dan Oneaţă Beáta Lőrincz Adriana Stan H. Cucu 26 3 0 07 Jun 2022
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish A. Oktem Rodolfo Zevallos Yasmin Moslem Günes Öztürk Karen Sarhon 26 0 0 31 May 2022
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data Sungwon Kim Heeseung Kim Sung-Hoon Yoon DiffM 204 52 0 30 May 2022
A Tale of Two Flows: Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model Jianwen Xie Y. Zhu Juntao Li Ping Li 24 50 0 13 May 2022
Talking Face Generation with Multilingual TTS Hyoung-Kyu Song Sanghyun Woo Junhyeok Lee S. Yang Hyunjae Cho Youseong Lee Dongho Choi Kang-Wook Kim CVBM 40 21 0 13 May 2022
Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model J. Valin Ahmed Mustafa Christopher Montgomery Timothy B. Terriberry Michael Klingbeil Paris Smaragdis A. Krishnaswamy 24 18 0 11 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality Xu Tan Jiawei Chen Haohe Liu Jian Cong Chen Zhang ... Lei He Frank Soong Tao Qin Sheng Zhao Tie-Yan Liu 44 213 0 09 May 2022
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence Sangshin Oh Seyun Um Hong-Goo Kang BDL DRL 16 2 0 09 May 2022
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss Efthymios Georgiou Kosmas Kritsis Georgios Paraskevopoulos Athanasios Katsamanis Vassilis Katsouros Alexandros Potamianos 23 3 0 28 Apr 2022
Parallel Synthesis for Autoregressive Speech Generation Po-Chun Hsu Da-Rong Liu Andy T. Liu Hung-yi Lee 42 5 0 25 Apr 2022
Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation Detai Xin Shinnosuke Takamichi T. Okamoto Hisashi Kawai Hiroshi Saruwatari 24 0 0 22 Apr 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis Rongjie Huang Max W. Y. Lam Jun Wang Dan Su Dong Yu Yi Ren Zhou Zhao DiffM 28 166 0 21 Apr 2022
Music Source Separation with Generative Flow Ge Zhu Jordan Darefsky Fei Jiang A. Selitskiy Z. Duan 25 6 0 19 Apr 2022
Learning and controlling the source-filter representation of speech with a variational autoencoder Samir Sadok Simon Leglaive Laurent Girin Xavier Alameda-Pineda Renaud Séguier SSL DRL BDL 32 14 0 14 Apr 2022
A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture Zhe-ming Lu Mengnan He Ruixiong Zhang Caixia Gong GAN 14 2 0 12 Apr 2022
Fine-grained Noise Control for Multispeaker Speech Synthesis Karolos Nikitaras G. Vamvoukakis Nikolaos Ellinas Konstantinos Klapsas K. Markopoulos S. Raptis June Sig Sung Gunu Jho Aimilios Chalamandaris Pirros Tsiakoulis 29 4 0 11 Apr 2022
Karaoker: Alignment-free singing voice synthesis with speech training data Panos Kakoulidis Nikolaos Ellinas G. Vamvoukakis K. Markopoulos June Sig Sung Gunu Jho Pirros Tsiakoulis Aimilios Chalamandaris 12 3 0 08 Apr 2022
Correcting Mispronunciations in Speech using Spectrogram Inpainting Talia Ben Simon Felix Kreuk Faten Awwad Jacob T. Cohen Joseph Keshet 12 2 0 07 Apr 2022
Lip to Speech Synthesis with Visual Context Attentional GAN Minsu Kim Joanna Hong Y. Ro 28 51 0 04 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis Fan Wang Po-Chun Hsu Da-Rong Liu Hung-yi Lee 13 0 0 01 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis Karren D. Yang Dejan Marković Steven Krenn Vasu Agrawal Alexander Richard VGen 16 32 0 31 Mar 2022
HiFi-VC: High Quality ASR-Based Voice Conversion A. Kashkin I. Karpukhin S. Shishkin 29 5 0 31 Mar 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis Hubert Siuzdak Piotr Dura Pol van Rijn Nori Jacoby AI4TS 18 30 0 31 Mar 2022