ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.03499
  4. Cited By
WaveNet: A Generative Model for Raw Audio
v1v2 (latest)

WaveNet: A Generative Model for Raw Audio

12 September 2016
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
    DiffM
ArXiv (abs)PDFHTML

Papers citing "WaveNet: A Generative Model for Raw Audio"

50 / 3,082 papers shown
Title
Analysis and Assessment of Controllability of an Expressive Deep
  Learning-based TTS system
Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system
Noé Tits
Kevin El Haddad
Thierry Dutoit
69
5
0
06 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker
  Representations for Multi-Speaker Multi-Style Text-to-Speech
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
C. Chien
Jheng-hao Lin
Chien-yu Huang
Po-Chun Hsu
Hung-yi Lee
127
70
0
06 Mar 2021
Enhanced 3D Human Pose Estimation from Videos by using Attention-Based
  Neural Network with Dilated Convolutions
Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions
Ruixu Liu
Ju Shen
He Wang
Chong Chen
S. Cheung
V. Asari
3DH
72
31
0
04 Mar 2021
crank: An Open-Source Software for Nonparallel Voice Conversion Based on
  Vector-Quantized Variational Autoencoder
crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder
Kazuhiro Kobayashi
Wen-Chin Huang
Yi-Chiao Wu
Patrick Lumban Tobing
Tomoki Hayashi
Tomoki Toda
BDLDRL
68
19
0
04 Mar 2021
Predicting Video with VQVAE
Predicting Video with VQVAE
Jacob Walker
Ali Razavi
Aaron van den Oord
DRL
131
69
0
02 Mar 2021
A Spectral Enabled GAN for Time Series Data Generation
A Spectral Enabled GAN for Time Series Data Generation
Kaleb E. Smith
Anthony O. Smith
GAN
45
12
0
02 Mar 2021
Experiments with Rich Regime Training for Deep Learning
Experiments with Rich Regime Training for Deep Learning
Xinyan Li
A. Banerjee
73
2
0
26 Feb 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges,
  countermeasures, and way forward
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood
M. Nawaz
K. Malik
A. Javed
Aun Irtaza
AAML
208
323
0
25 Feb 2021
Automatic Feature Extraction for Heartbeat Anomaly Detection
Automatic Feature Extraction for Heartbeat Anomaly Detection
Robert-George Colt
Csongor-Huba Várady
Riccardo Volpi
Luigi Malagò
24
4
0
24 Feb 2021
Multi-Task Temporal Convolutional Networks for Joint Recognition of
  Surgical Phases and Steps in Gastric Bypass Procedures
Multi-Task Temporal Convolutional Networks for Joint Recognition of Surgical Phases and Steps in Gastric Bypass Procedures
Sanat Ramesh
Diego DallÁlba
Cristians Gonzalez
Tong Yu
Pietro Mascagni
Didier Mutter
J. Marescaux
Paolo Fiorini
N. Padoy
83
69
0
24 Feb 2021
Speech Enhancement Using Multi-Stage Self-Attentive Temporal
  Convolutional Networks
Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks
Ju Lin
A. Wijngaarden
Kuang-Ching Wang
M. C. Smith
78
51
0
24 Feb 2021
Handling Background Noise in Neural Speech Generation
Handling Background Noise in Neural Speech Generation
Tom Denton
Alejandro Luebs
Felicia S. C. Lim
Andrew Storus
Hengchin Yeh
W. Kleijn
Jan Skoglund
52
2
0
23 Feb 2021
Anytime Sampling for Autoregressive Models via Ordered Autoencoding
Anytime Sampling for Autoregressive Models via Ordered Autoencoding
Yilun Xu
Yang Song
Sahaj Garg
Linyuan Gong
Rui Shu
Aditya Grover
Stefano Ermon
DiffM
93
11
0
23 Feb 2021
Investigating Deep Neural Structures and their Interpretability in the
  Domain of Voice Conversion
Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion
Samuel J. Broughton
Md. Asif Jalal
Roger K. Moore
39
0
0
22 Feb 2021
Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding
Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding
Yangjun Ruan
Karen Ullrich
Daniel de Souza Severo
James Townsend
Ashish Khisti
Arnaud Doucet
Alireza Makhzani
Chris J. Maddison
112
25
0
22 Feb 2021
Anyone GAN Sing
Anyone GAN Sing
Shreeviknesh Sankaran
Sukavanan Nanjundan
G. Anand
GAN
54
2
0
22 Feb 2021
Introducing an experimental distortion-tolerant speech encryption scheme
  for secure voice communication
Introducing an experimental distortion-tolerant speech encryption scheme for secure voice communication
Piotr Krasnowski
J. Lebrun
Bruno Martin
22
2
0
19 Feb 2021
Hierarchical Recurrent Neural Networks for Conditional Melody Generation
  with Long-term Structure
Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure
Zixun Guo
D. Makris
Dorien Herremans
77
24
0
19 Feb 2021
Generative Speech Coding with Predictive Variance Regularization
Generative Speech Coding with Predictive Variance Regularization
W. Kleijn
Andrew Storus
Michael Chinen
Tom Denton
Felicia S. C. Lim
Alejandro Luebs
Jan Skoglund
Hengchin Yeh
68
68
0
18 Feb 2021
AudioVisual Speech Synthesis: A brief literature review
AudioVisual Speech Synthesis: A brief literature review
Efthymios Georgiou
Athanasios Katsamanis
27
0
0
18 Feb 2021
Deep Learning Approaches for Forecasting Strawberry Yields and Prices
  Using Satellite Images and Station-Based Soil Parameters
Deep Learning Approaches for Forecasting Strawberry Yields and Prices Using Satellite Images and Station-Based Soil Parameters
Mohita Chaudhary
Mohamed Sadok Gastli
Lobna Nassar
Fakhri Karray
18
7
0
17 Feb 2021
One-shot action recognition in challenging therapy scenarios
One-shot action recognition in challenging therapy scenarios
Alberto Sabater
Laura Santos
J. Santos-Victor
Alexandre Bernardino
Luis Montesano
Ana C. Murillo
129
39
0
17 Feb 2021
Hierarchical VAEs Know What They Don't Know
Hierarchical VAEs Know What They Don't Know
Jakob Drachmann Havtorn
J. Frellsen
Søren Hauberg
Lars Maaløe
DRL
131
74
0
16 Feb 2021
PeriodNet: A non-autoregressive waveform generation model with a
  structure separating periodic and aperiodic components
PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components
Yukiya Hono
Shinji Takaki
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
69
16
0
15 Feb 2021
Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and
  language Models for Intent Classification
Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification
Bidisha Sharma
Maulik C. Madhavi
Haizhou Li
51
20
0
15 Feb 2021
Deep Convolutional and Recurrent Networks for Polyphonic Instrument
  Classification from Monophonic Raw Audio Waveforms
Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms
Kleanthis Avramidis
Agelos Kratimenos
C. Garoufis
Athanasia Zlatintsi
Petros Maragos
43
8
0
13 Feb 2021
Enhancing into the codec: Noise Robust Speech Coding with
  Vector-Quantized Autoencoders
Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
Jonah Casebeer
Vinjai Vale
Umut Isik
J. Valin
Ritwik Giri
A. Krishnaswamy
100
20
0
12 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Jane Polak Scowcroft
95
22
0
12 Feb 2021
DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech
  Signals
DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals
Satwinder Singh
Ruili Wang
Yuanhang Qiu
45
26
0
11 Feb 2021
ASVspoof 2019: spoofing countermeasures for the detection of
  synthesized, converted and replayed speech
ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech
A. Nautsch
Xin Wang
Nicholas W. D. Evans
Tomi Kinnunen
Ville Vestman
Massimiliano Todisco
Héctor Delgado
Md. Sahidullah
Junichi Yamagishi
Kong Aik Lee
197
155
0
11 Feb 2021
Causal Inference for Time series Analysis: Problems, Methods and
  Evaluation
Causal Inference for Time series Analysis: Problems, Methods and Evaluation
Raha Moraffah
Paras Sheth
Mansooreh Karami
Anchit Bhattacharya
Qianru Wang
Anique Tahir
A. Raglin
Huan Liu
CMLAI4TS
113
111
0
11 Feb 2021
Self-Supervised VQ-VAE for One-Shot Music Style Transfer
Self-Supervised VQ-VAE for One-Shot Music Style Transfer
Ondřej Cífka
A. Ozerov
Umut Simsekli
G. Richard
79
28
0
10 Feb 2021
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based
  on Transfer Learning
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning
Giuseppe Ruggiero
Enrico Zovato
Luigi Di Caro
V. Pollet
DiffM
63
10
0
10 Feb 2021
Conditional Loss and Deep Euler Scheme for Time Series Generation
Conditional Loss and Deep Euler Scheme for Time Series Generation
Carl Remlinger
Joseph Mikael
Romuald Elie
DiffM
111
12
0
10 Feb 2021
EMA2S: An End-to-End Multimodal Articulatory-to-Speech System
EMA2S: An End-to-End Multimodal Articulatory-to-Speech System
Yu-Wen Chen
Kuo-Hsuan Hung
Shang-Yi Chuang
Jonathan Sherman
Wen-Chin Huang
Xugang Lu
Yu Tsao
86
16
0
07 Feb 2021
Multi-Task Self-Supervised Pre-Training for Music Classification
Multi-Task Self-Supervised Pre-Training for Music Classification
Ho-Hsiang Wu
Chieh-Chi Kao
Qingming Tang
Ming Sun
Brian McFee
J. P. Bello
Chao Wang
SSL
414
37
0
05 Feb 2021
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity
Jang-Hyun Kim
Wonho Choo
Hosan Jeong
Hyun Oh Song
275
184
0
05 Feb 2021
Invertible DenseNets with Concatenated LipSwish
Invertible DenseNets with Concatenated LipSwish
Yura Perugachi-Diaz
Jakub M. Tomczak
Sandjai Bhulai
139
20
0
04 Feb 2021
CKConv: Continuous Kernel Convolution For Sequential Data
CKConv: Continuous Kernel Convolution For Sequential Data
David W. Romero
Anna Kuzina
Erik J. Bekkers
Jakub M. Tomczak
Mark Hoogendoorn
77
126
0
04 Feb 2021
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
295
366
0
01 Feb 2021
Universal Neural Vocoding with Parallel WaveNet
Universal Neural Vocoding with Parallel WaveNet
Yunlong Jiao
Adam Gabry's
Georgi Tinchev
Bartosz Putrycz
Daniel Korzekwa
V. Klimkov
85
42
0
01 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With
  Multi-guidance Attention And Multi-band Multi-time LPCNet
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Shilu Lin
Fenglong Xie
Li Meng
Xinhui Li
Li Lu
83
0
0
30 Jan 2021
Time Series (re)sampling using Generative Adversarial Networks
Time Series (re)sampling using Generative Adversarial Networks
Christian Moller Dahl
Emil N. Sørensen
TTAAI4TS
66
6
0
30 Jan 2021
Expressive Neural Voice Cloning
Expressive Neural Voice Cloning
Paarth Neekhara
Shehzeen Samarah Hussain
Shlomo Dubnov
F. Koushanfar
Julian McAuley
DiffM
59
30
0
30 Jan 2021
A causal convolutional neural network for multi-subject motion modeling
  and generation
A causal convolutional neural network for multi-subject motion modeling and generation
Shuaiying Hou
Congyi Wang
Wenlin Zhuang
Yu Chen
Yangang Wang
Hujun Bao
Jinxiang Chai
Weiwei Xu
78
4
0
28 Jan 2021
Autoregressive Denoising Diffusion Models for Multivariate Probabilistic
  Time Series Forecasting
Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting
Kashif Rasul
Calvin Seward
Ingmar Schuster
Roland Vollgraf
DiffM
194
320
0
28 Jan 2021
Semi-supervised source localization in reverberant environments with
  deep generative modeling
Semi-supervised source localization in reverberant environments with deep generative modeling
Michael J. Bianco
Sharon Gannot
Efren Fernandez-Grande
Peter Gerstoft
66
21
0
26 Jan 2021
High-Quality Vocoding Design with Signal Processing for Speech Synthesis
  and Voice Conversion
High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion
M. S. Al-Radhi
34
1
0
25 Jan 2021
Multi-Task Time Series Forecasting With Shared Attention
Multi-Task Time Series Forecasting With Shared Attention
Zekai Chen
Jiaze E
Xiao Zhang
Hao Sheng
Xiuzhen Cheng
AI4TS
86
20
0
24 Jan 2021
Generating a Doppelganger Graph: Resembling but Distinct
Generating a Doppelganger Graph: Resembling but Distinct
Yuliang Ji
Ru Huang
Jie Chen
Yuanzhe Xi
55
2
0
23 Jan 2021
Previous
123...323334...606162
Next