ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.03499
  4. Cited By
WaveNet: A Generative Model for Raw Audio
v1v2 (latest)

WaveNet: A Generative Model for Raw Audio

12 September 2016
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
    DiffM
ArXiv (abs)PDFHTML

Papers citing "WaveNet: A Generative Model for Raw Audio"

50 / 3,082 papers shown
Title
What Averages Do Not Tell -- Predicting Real Life Processes with
  Sequential Deep Learning
What Averages Do Not Tell -- Predicting Real Life Processes with Sequential Deep Learning
István Ketykó
F. Mannhardt
Marwan Hassani
B. V. Dongen
AI4TS
66
10
0
19 Oct 2021
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal
  Padding
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding
Pratik Fegade
Tianqi Chen
Phillip B. Gibbons
T. Mowry
87
29
0
19 Oct 2021
Chunked Autoregressive GAN for Conditional Waveform Synthesis
Chunked Autoregressive GAN for Conditional Waveform Synthesis
Max Morrison
Rithesh Kumar
Kundan Kumar
Prem Seetharaman
Aaron Courville
Yoshua Bengio
GAN
132
72
0
19 Oct 2021
CycleFlow: Purify Information Factors by Cycle Loss
CycleFlow: Purify Information Factors by Cycle Loss
Haoran Sun
Chen Chen
Lantian Li
Dong Wang
72
1
0
18 Oct 2021
KaraTuner: Towards end to end natural pitch correction for singing voice
  in karaoke
KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke
Xiaobin Zhuang
Huiran Yu
Weifeng Zhao
Tao Jiang
Peng Hu
90
6
0
18 Oct 2021
VISinger: Variational Inference with Adversarial Learning for End-to-End
  Singing Voice Synthesis
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Yongmao Zhang
Jian Cong
Heyang Xue
Lei Xie
Pengcheng Zhu
Mengxiao Bi
99
77
0
17 Oct 2021
Taming Visually Guided Sound Generation
Taming Visually Guided Sound Generation
Vladimir E. Iashin
Esa Rahtu
VLM
133
128
0
17 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffMVGen
99
43
0
15 Oct 2021
Advances and Challenges in Deep Lip Reading
Advances and Challenges in Deep Lip Reading
Marzieh Oghbaie
Arian Sabaghi
Kooshan Hashemifard
Mohammad Akbari
VLM
70
15
0
15 Oct 2021
Diffusion Normalizing Flow
Diffusion Normalizing Flow
Qinsheng Zhang
Yongxin Chen
DiffM
115
94
0
14 Oct 2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice
  Generation
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
N. Yuan
GAN
203
63
0
14 Oct 2021
SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs
SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs
Adrián Barahona-Ríos
Tom Collins
GAN
49
4
0
14 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Haitong Zhang
Yue Lin
58
0
0
14 Oct 2021
Multistage linguistic conditioning of convolutional layers for speech
  emotion recognition
Multistage linguistic conditioning of convolutional layers for speech emotion recognition
Andreas Triantafyllopoulos
U. Reichel
Shuo Liu
Simon Huber
F. Eyben
Björn W. Schuller
101
11
0
13 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis
A Melody-Unsupervision Model for Singing Voice Synthesis
Soonbeom Choi
Juhan Nam
67
14
0
13 Oct 2021
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding
Sergey Nikonorov
Berrak Sisman
Mingyang Zhang
Haizhou Li
41
3
0
13 Oct 2021
A Multi-scale Time-series Dataset with Benchmark for Machine Learning in
  Decarbonized Energy Grids
A Multi-scale Time-series Dataset with Benchmark for Machine Learning in Decarbonized Energy Grids
Xiangtian Zheng
Nan Xu
Loc Trinh
Dongqi Wu
Tong Huang
S. Sivaranjani
Yan Liu
Le Xie
AI4CE
67
47
0
12 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning
Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara
Jason Chun Lok Li
Boris Ginsburg
144
15
0
12 Oct 2021
Unsupervised Source Separation via Bayesian Inference in the Latent
  Domain
Unsupervised Source Separation via Bayesian Inference in the Latent Domain
Michele Mancusi
Emilian Postolache
Giorgio Mariani
Marco Fumero
Andrea Santilli
Luca Cosmo
Emanuele Rodolà
BDL
62
2
0
11 Oct 2021
Pitch Preservation In Singing Voice Synthesis
Pitch Preservation In Singing Voice Synthesis
Shujun Liu
Hai Zhu
Kun Wang
Huajun Wang
50
0
0
11 Oct 2021
Application of Graph Convolutions in a Lightweight Model for Skeletal
  Human Motion Forecasting
Application of Graph Convolutions in a Lightweight Model for Skeletal Human Motion Forecasting
L. Hermes
Barbara Hammer
M. Schilling
3DH
53
4
0
10 Oct 2021
Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in
  High-order Latent Domain
Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain
Zengwei Yao
Wenjie Pei
Fanglin Chen
Guangming Lu
David C. Zhang
74
12
0
10 Oct 2021
Denoising Diffusion Gamma Models
Denoising Diffusion Gamma Models
Eliya Nachmani
S. Robin
Lior Wolf
DiffMVLM
85
32
0
10 Oct 2021
F-Divergences and Cost Function Locality in Generative Modelling with
  Quantum Circuits
F-Divergences and Cost Function Locality in Generative Modelling with Quantum Circuits
Chiara Leadbeater
Louis Sharrock
Brian Coyle
Marcello Benedetti
59
11
0
08 Oct 2021
Temporal Convolutions for Multi-Step Quadrotor Motion Prediction
Temporal Convolutions for Multi-Step Quadrotor Motion Prediction
Sam Looper
Steven L. Waslander
93
5
0
08 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer
  Normalization and Semi-Supervised Training in Text-To-Speech
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Pengfei Wu
Junjie Pan
Chenchang Xu
Junhui Zhang
Lin Wu
Xiang Yin
Zejun Ma
72
16
0
08 Oct 2021
MilliTRACE-IR: Contact Tracing and Temperature Screening via mm-Wave and
  Infrared Sensing
MilliTRACE-IR: Contact Tracing and Temperature Screening via mm-Wave and Infrared Sensing
Marco Canil
Jacopo Pegoraro
Michele Rossi
87
13
0
08 Oct 2021
ATISS: Autoregressive Transformers for Indoor Scene Synthesis
ATISS: Autoregressive Transformers for Indoor Scene Synthesis
Despoina Paschalidou
Amlan Kar
Maria Shugrina
Karsten Kreis
Andreas Geiger
Sanja Fidler
3DVViT
143
155
0
07 Oct 2021
Cloning one's voice using very limited data in the wild
Cloning one's voice using very limited data in the wild
Dongyang Dai
Yuan-Jui Chen
Li Chen
Ming Tu
Lu Liu
Rui Xia
Qiao Tian
Yuping Wang
Yuxuan Wang
SyDa
61
9
0
07 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic
  Voice Over
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over
Junchen Lu
Berrak Sisman
Rui Liu
Mingyang Zhang
Haizhou Li
DiffM
93
20
0
07 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel
  neural TTS
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
T. Raitio
Jiangchuan Li
Shreyas Seshadri
85
23
0
06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
48
2
0
06 Oct 2021
3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple
  Objects from Video
3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video
Justin Wilson
Ming-Chia Lin
44
1
0
05 Oct 2021
Unsupervised Speech Segmentation and Variable Rate Representation
  Learning using Segmental Contrastive Predictive Coding
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro-Velazquez
Najim Dehak
SSL
134
23
0
05 Oct 2021
Networked Time Series Prediction with Incomplete Data via Generative
  Adversarial Network
Networked Time Series Prediction with Incomplete Data via Generative Adversarial Network
Yichen Zhu
Bo Jiang
Haiming Jin
Mengtian Zhang
Feng Gao
Jianqiang Huang
Tao Lin
Xinbing Wang
GNNAI4TS
100
5
0
05 Oct 2021
Autoregressive Diffusion Models
Autoregressive Diffusion Models
Emiel Hoogeboom
Alexey A. Gritsenko
Jasmijn Bastings
Ben Poole
Rianne van den Berg
Tim Salimans
DiffM
134
155
0
05 Oct 2021
WaveBeat: End-to-end beat and downbeat tracking in the time domain
WaveBeat: End-to-end beat and downbeat tracking in the time domain
C. Steinmetz
Joshua D. Reiss
28
9
0
04 Oct 2021
On the Interplay Between Sparsity, Naturalness, Intelligibility, and
  Prosody in Speech Synthesis
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
Cheng-I Jeff Lai
Erica Cooper
Yang Zhang
Shiyu Chang
Kaizhi Qian
...
Yung-Sung Chuang
Alexander H. Liu
Junichi Yamagishi
David D. Cox
James R. Glass
71
6
0
04 Oct 2021
A review of Generative Adversarial Networks (GANs) and its applications
  in a wide variety of disciplines -- From Medical to Remote Sensing
A review of Generative Adversarial Networks (GANs) and its applications in a wide variety of disciplines -- From Medical to Remote Sensing
Ankan Dash
J. Ye
Guiling Wang
MedImAI4CE
76
99
0
01 Oct 2021
Multi Scale Graph Wavenet for Wind Speed Forecasting
Multi Scale Graph Wavenet for Wind Speed Forecasting
Neetesh Rathore
Pradeep Rathore
Arghya Basak
S. Nistala
Venkataramana Runkana
AI4TS
113
19
0
30 Sep 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
137
79
0
30 Sep 2021
USEV: Universal Speaker Extraction with Visual Cue
USEV: Universal Speaker Extraction with Visual Cue
Zexu Pan
Meng Ge
Haizhou Li
80
44
0
30 Sep 2021
Multimodal Emotion Recognition with High-level Speech and Text Features
Multimodal Emotion Recognition with High-level Speech and Text Features
M. R. Makiuchi
Kuniaki Uto
Koichi Shinoda
85
72
0
29 Sep 2021
Vitruvion: A Generative Model of Parametric CAD Sketches
Vitruvion: A Generative Model of Parametric CAD Sketches
Ari Seff
Wenda Zhou
Nick Richardson
Ryan P. Adams
83
66
0
29 Sep 2021
VoiceFixer: Toward General Speech Restoration with Neural Vocoder
VoiceFixer: Toward General Speech Restoration with Neural Vocoder
Haohe Liu
Qiuqiang Kong
Qiao Tian
Yan Zhao
DeLiang Wang
Chuanzeng Huang
Yuxuan Wang
98
58
0
28 Sep 2021
MSR-NV: Neural Vocoder Using Multiple Sampling Rates
MSR-NV: Neural Vocoder Using Multiple Sampling Rates
Kentaro Mitsui
Kei Sawada
109
0
0
28 Sep 2021
Which Design Decisions in AI-enabled Mobile Applications Contribute to
  Greener AI?
Which Design Decisions in AI-enabled Mobile Applications Contribute to Greener AI?
Roger Creus Castanyer
Silverio Martínez-Fernández
Xavier Franch
99
15
0
28 Sep 2021
Audio-to-Image Cross-Modal Generation
Audio-to-Image Cross-Modal Generation
Maciej Żelaszczyk
Jacek Mańdziuk
DiffM
120
17
0
27 Sep 2021
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for
  Speech Synthesis
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Manh Luong
Viet-Anh Tran
26
2
0
27 Sep 2021
Dynamic Adaptive Spatio-temporal Graph Convolution for fMRI Modelling
Dynamic Adaptive Spatio-temporal Graph Convolution for fMRI Modelling
A. E. Gazzar
R. Thomas
G. Wingen
75
20
0
26 Sep 2021
Previous
123...262728...606162
Next