Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.03499
Cited By
v1
v2 (latest)
WaveNet: A Generative Model for Raw Audio
12 September 2016
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"WaveNet: A Generative Model for Raw Audio"
50 / 3,082 papers shown
Title
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
190
39
0
24 Aug 2023
A Survey of AI Music Generation Tools and Models
Yueyue Zhu
Jared Baca
Banafsheh Rekabdar
Reza Rawassizadeh
MGen
108
18
0
24 Aug 2023
An Initial Exploration: Learning to Generate Realistic Audio for Silent Video
Matthew Martel
Jack Wagner
VGen
45
0
0
23 Aug 2023
Efficient Transfer Learning in Diffusion Models via Adversarial Noise
Xiyu Wang
Baijiong Lin
Daochang Liu
Chang Xu
DiffM
99
3
0
23 Aug 2023
Example-Based Framework for Perceptually Guided Audio Texture Generation
Purnima Kamath
Chitralekha Gupta
L. Wyse
Suranga Nanayakkara
48
4
0
23 Aug 2023
Complex-valued neural networks for voice anti-spoofing
Nicolas Müller
Philip Sperl
Konstantin Böttinger
79
16
0
22 Aug 2023
MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive Learning with Omics-Inference Modeling
Ziwei Yang
Zhengjun Chen
Yasuko Matsubara
Yasushi Sakurai
61
2
0
17 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
115
1
0
14 Aug 2023
Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled and Synthetic Data
Jérémy Cochoy
32
0
0
14 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
82
5
0
14 Aug 2023
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation
Zhichao Wang
M. Dai
Keld Lundgaard
VGen
DiffM
78
2
0
12 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
133
246
0
10 Aug 2023
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Liyang Chen
Zhiyong Wu
Runnan Li
Weihong Bao
Jun Ling
Xuejiao Tan
Sheng Zhao
67
5
0
09 Aug 2023
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGen
VLM
DiffM
167
41
0
09 Aug 2023
Weakly Supervised Multi-Task Representation Learning for Human Activity Analysis Using Wearables
Taoran Sheng
Manfred Huber
SSL
HAI
65
21
0
06 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
72
1
0
03 Aug 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
85
24
0
02 Aug 2023
Music De-limiter Networks via Sample-wise Gain Inversion
Chang-Bin Jeon
Kyogu Lee
55
1
0
02 Aug 2023
A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction
Renteng Yuan
Mohamed Abdel-Aty
Q. Xiang
Zijin Wang
Xin Gu
90
10
0
01 Aug 2023
Generative models for wearables data
Arinbjorn Kolbeinsson
L. Foschini
MedIm
57
0
0
31 Jul 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
100
1
0
31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
102
16
0
31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
59
41
0
31 Jul 2023
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation
Vu Ngoc Tu
V. Huynh
Hyung-Jeong Yang
M. Zaheer
Shah Nawaz
Karthik Nandakumar
Soo-Hyung Kim
78
5
0
31 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
Jianwu Dang
DiffM
78
9
0
28 Jul 2023
Self-Supervised Visual Acoustic Matching
Arjun Somayazulu
Changan Chen
Kristen Grauman
SSL
91
13
0
27 Jul 2023
CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Youqiang Zheng
Li Xiao
Weiping Tu
Yuhong Yang
Xinmeng Xu
105
6
0
25 Jul 2023
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Iván Vallés-Pérez
Grzegorz Beringer
Piotr Bilinski
G. Cook
Roberto Barra-Chicote
58
1
0
23 Jul 2023
PartDiff: Image Super-resolution with Partial Diffusion Models
Kai Zhao
A. Hung
Kai-Lin Pang
Haoxin Zheng
Kyunghyun Sung
DiffM
MedIm
58
3
0
21 Jul 2023
Learning minimal representations of stochastic processes with variational autoencoders
Gabriel Fernández-Fernández
Carlo Manzo
M. Lewenstein
A. Dauphin
Gorka Muñoz-Gil
DiffM
62
6
0
21 Jul 2023
Progressive distillation diffusion for raw music generation
Svetlana Pavlova
DiffM
67
0
0
20 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
79
2
0
20 Jul 2023
TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition
Isabel Funke
Dominik Rivoir
Stefanie Krell
Stefanie Speidel
90
4
0
19 Jul 2023
Synthetic Lagrangian Turbulence by Generative Diffusion Models
Tianyi Li
Luca Biferale
F. Bonaccorso
M. A. Scarpolini
M. Buzzicotti
DiffM
74
44
0
17 Jul 2023
GBT: Two-stage transformer framework for non-stationary time series forecasting
Li Shen
Yuning Wei
Yangzhu Wang
AI4TS
93
24
0
17 Jul 2023
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Tianyang Hu
Fei Chen
Hong Wang
Jiawei Li
Wei Cao
Jiacheng Sun
Zechao Li
DiffM
118
10
0
17 Jul 2023
Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training
Yao Wei
Yanchao Sun
Ruijie Zheng
Sai H. Vemprala
Rogerio Bonatti
Shuhang Chen
Ratnesh Madaan
Zhongjie Ba
Ashish Kapoor
Shuang Ma
OffRL
63
17
0
16 Jul 2023
Benchmarks and Custom Package for Electrical Load Forecasting
Zhixian Wang
Qingsong Wen
Chaoli Zhang
Liang Sun
Leandro Von Krannichfeldt
Yi Wang
AI4TS
85
4
0
14 Jul 2023
Tapestry of Time and Actions: Modeling Human Activity Sequences using Temporal Point Process Flows
Vinayak Gupta
Srikanta J. Bedathur
AI4TS
69
1
0
13 Jul 2023
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects
Rishabh Ranjan
Mayank Vatsa
Richa Singh
74
4
0
13 Jul 2023
SnakeSynth: New Interactions for Generative Audio Synthesis
Eric Easthope
59
0
0
11 Jul 2023
DE-TGN: Uncertainty-Aware Human Motion Forecasting using Deep Ensembles
Kareem A. Eltouny
Wansong Liu
Sibo Tian
Minghui Zheng
Xiao Liang
3DH
53
7
0
07 Jul 2023
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays
R. Bhattacharjea
Nathan E. West
SSL
32
1
0
06 Jul 2023
On the Constrained Time-Series Generation Problem
Andrea Coletta
Sriram Gopalakrishnan
Daniel Borrajo
Svitlana Vyetrenko
DiffM
AI4TS
112
40
0
04 Jul 2023
Disentanglement in a GAN for Unconditional Speech Synthesis
Matthew Baas
Herman Kamper
DiffM
87
4
0
04 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
46
6
0
03 Jul 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Sheng Zhao
Qi-ping Yuan
Yibo Duan
Zhuo Chen
41
2
0
03 Jul 2023
Conformer LLMs -- Convolution Augmented Large Language Models
Prateek Verma
62
1
0
02 Jul 2023
Improving the Transferability of Time Series Forecasting with Decomposition Adaptation
Yan-hong Gao
Yan Wang
Qiang Wang
AI4TS
54
0
0
30 Jun 2023
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units
Junchen Lu
Berrak Sisman
Mingyang Zhang
Haizhou Li
85
4
0
29 Jun 2023
Previous
1
2
3
...
11
12
13
...
60
61
62
Next