v1v2 (latest)

WaveNet: A Generative Model for Raw Audio

12 September 2016

Papers citing "WaveNet: A Generative Model for Raw Audio"

50 / 3,082 papers shown

Title
What Averages Do Not Tell -- Predicting Real Life Processes with Sequential Deep Learning István Ketykó F. Mannhardt Marwan Hassani B. V. Dongen AI4TS 66 10 0 19 Oct 2021
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding Pratik Fegade Tianqi Chen Phillip B. Gibbons T. Mowry 87 29 0 19 Oct 2021
Chunked Autoregressive GAN for Conditional Waveform Synthesis Max Morrison Rithesh Kumar Kundan Kumar Prem Seetharaman Aaron Courville Yoshua Bengio GAN 132 72 0 19 Oct 2021
CycleFlow: Purify Information Factors by Cycle Loss Haoran Sun Chen Chen Lantian Li Dong Wang 72 1 0 18 Oct 2021
KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke Xiaobin Zhuang Huiran Yu Weifeng Zhao Tao Jiang Peng Hu 90 6 0 18 Oct 2021
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis Yongmao Zhang Jian Cong Heyang Xue Lei Xie Pengcheng Zhu Mengxiao Bi 99 77 0 17 Oct 2021
Taming Visually Guided Sound Generation Vladimir E. Iashin Esa Rahtu VLM 133 128 0 17 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts Chenxu Hu Qiao Tian Tingle Li Yuping Wang Yuxuan Wang Hang Zhao DiffM VGen 99 43 0 15 Oct 2021
Advances and Challenges in Deep Lip Reading Marzieh Oghbaie Arian Sabaghi Kooshan Hashemifard Mohammad Akbari VLM 70 15 0 15 Oct 2021
Diffusion Normalizing Flow Qinsheng Zhang Yongxin Chen DiffM 115 94 0 14 Oct 2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation Rongjie Huang Chenye Cui Feiyang Chen Yi Ren Jinglin Liu Zhou Zhao Baoxing Huai N. Yuan GAN 203 63 0 14 Oct 2021
SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs Adrián Barahona-Ríos Tom Collins GAN 49 4 0 14 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data Haitong Zhang Yue Lin 58 0 0 14 Oct 2021
Multistage linguistic conditioning of convolutional layers for speech emotion recognition Andreas Triantafyllopoulos U. Reichel Shuo Liu Simon Huber F. Eyben Björn W. Schuller 101 11 0 13 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis Soonbeom Choi Juhan Nam 67 14 0 13 Oct 2021
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding Sergey Nikonorov Berrak Sisman Mingyang Zhang Haizhou Li 41 3 0 13 Oct 2021
A Multi-scale Time-series Dataset with Benchmark for Machine Learning in Decarbonized Energy Grids Xiangtian Zheng Nan Xu Loc Trinh Dongqi Wu Tong Huang S. Sivaranjani Yan Liu Le Xie AI4CE 67 47 0 12 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning Paarth Neekhara Jason Chun Lok Li Boris Ginsburg 144 15 0 12 Oct 2021
Unsupervised Source Separation via Bayesian Inference in the Latent Domain Michele Mancusi Emilian Postolache Giorgio Mariani Marco Fumero Andrea Santilli Luca Cosmo Emanuele Rodolà BDL 62 2 0 11 Oct 2021
Pitch Preservation In Singing Voice Synthesis Shujun Liu Hai Zhu Kun Wang Huajun Wang 50 0 0 11 Oct 2021
Application of Graph Convolutions in a Lightweight Model for Skeletal Human Motion Forecasting L. Hermes Barbara Hammer M. Schilling 3DH 53 4 0 10 Oct 2021
Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain Zengwei Yao Wenjie Pei Fanglin Chen Guangming Lu David C. Zhang 74 12 0 10 Oct 2021
Denoising Diffusion Gamma Models Eliya Nachmani S. Robin Lior Wolf DiffM VLM 85 32 0 10 Oct 2021
F-Divergences and Cost Function Locality in Generative Modelling with Quantum Circuits Chiara Leadbeater Louis Sharrock Brian Coyle Marcello Benedetti 59 11 0 08 Oct 2021
Temporal Convolutions for Multi-Step Quadrotor Motion Prediction Sam Looper Steven L. Waslander 93 5 0 08 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Pengfei Wu Junjie Pan Chenchang Xu Junhui Zhang Lin Wu Xiang Yin Zejun Ma 72 16 0 08 Oct 2021
MilliTRACE-IR: Contact Tracing and Temperature Screening via mm-Wave and Infrared Sensing Marco Canil Jacopo Pegoraro Michele Rossi 87 13 0 08 Oct 2021
ATISS: Autoregressive Transformers for Indoor Scene Synthesis Despoina Paschalidou Amlan Kar Maria Shugrina Karsten Kreis Andreas Geiger Sanja Fidler 3DV ViT 143 155 0 07 Oct 2021
Cloning one's voice using very limited data in the wild Dongyang Dai Yuan-Jui Chen Li Chen Ming Tu Lu Liu Rui Xia Qiao Tian Yuping Wang Yuxuan Wang SyDa 61 9 0 07 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over Junchen Lu Berrak Sisman Rui Liu Mingyang Zhang Haizhou Li DiffM 93 20 0 07 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 85 23 0 06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks E. Hortal Rodrigo Brechard Alarcia GAN 48 2 0 06 Oct 2021
3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video Justin Wilson Ming-Chia Lin 44 1 0 05 Oct 2021
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding Saurabhchand Bhati Jesús Villalba Piotr Żelasko Laureano Moro-Velazquez Najim Dehak SSL 134 23 0 05 Oct 2021
Networked Time Series Prediction with Incomplete Data via Generative Adversarial Network Yichen Zhu Bo Jiang Haiming Jin Mengtian Zhang Feng Gao Jianqiang Huang Tao Lin Xinbing Wang GNN AI4TS 100 5 0 05 Oct 2021
Autoregressive Diffusion Models Emiel Hoogeboom Alexey A. Gritsenko Jasmijn Bastings Ben Poole Rianne van den Berg Tim Salimans DiffM 134 155 0 05 Oct 2021
WaveBeat: End-to-end beat and downbeat tracking in the time domain C. Steinmetz Joshua D. Reiss 28 9 0 04 Oct 2021
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis Cheng-I Jeff Lai Erica Cooper Yang Zhang Shiyu Chang Kaizhi Qian ... Yung-Sung Chuang Alexander H. Liu Junichi Yamagishi David D. Cox James R. Glass 71 6 0 04 Oct 2021
A review of Generative Adversarial Networks (GANs) and its applications in a wide variety of disciplines -- From Medical to Remote Sensing Ankan Dash J. Ye Guiling Wang MedIm AI4CE 76 99 0 01 Oct 2021
Multi Scale Graph Wavenet for Wind Speed Forecasting Neetesh Rathore Pradeep Rathore Arghya Basak S. Nistala Venkataramana Runkana AI4TS 113 19 0 30 Sep 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech Yi Ren Jinglin Liu Zhou Zhao 137 79 0 30 Sep 2021
USEV: Universal Speaker Extraction with Visual Cue Zexu Pan Meng Ge Haizhou Li 80 44 0 30 Sep 2021
Multimodal Emotion Recognition with High-level Speech and Text Features M. R. Makiuchi Kuniaki Uto Koichi Shinoda 85 72 0 29 Sep 2021
Vitruvion: A Generative Model of Parametric CAD Sketches Ari Seff Wenda Zhou Nick Richardson Ryan P. Adams 83 66 0 29 Sep 2021
VoiceFixer: Toward General Speech Restoration with Neural Vocoder Haohe Liu Qiuqiang Kong Qiao Tian Yan Zhao DeLiang Wang Chuanzeng Huang Yuxuan Wang 98 58 0 28 Sep 2021
MSR-NV: Neural Vocoder Using Multiple Sampling Rates Kentaro Mitsui Kei Sawada 109 0 0 28 Sep 2021
Which Design Decisions in AI-enabled Mobile Applications Contribute to Greener AI? Roger Creus Castanyer Silverio Martínez-Fernández Xavier Franch 99 15 0 28 Sep 2021
Audio-to-Image Cross-Modal Generation Maciej Żelaszczyk Jacek Mańdziuk DiffM 120 17 0 27 Sep 2021
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis Manh Luong Viet-Anh Tran 26 2 0 27 Sep 2021
Dynamic Adaptive Spatio-temporal Graph Convolution for fMRI Modelling A. E. Gazzar R. Thomas G. Wingen 75 20 0 26 Sep 2021