ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.03499
  4. Cited By
WaveNet: A Generative Model for Raw Audio
v1v2 (latest)

WaveNet: A Generative Model for Raw Audio

12 September 2016
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
    DiffM
ArXiv (abs)PDFHTML

Papers citing "WaveNet: A Generative Model for Raw Audio"

50 / 3,082 papers shown
Title
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MAAuLLM
190
39
0
24 Aug 2023
A Survey of AI Music Generation Tools and Models
A Survey of AI Music Generation Tools and Models
Yueyue Zhu
Jared Baca
Banafsheh Rekabdar
Reza Rawassizadeh
MGen
108
18
0
24 Aug 2023
An Initial Exploration: Learning to Generate Realistic Audio for Silent
  Video
An Initial Exploration: Learning to Generate Realistic Audio for Silent Video
Matthew Martel
Jack Wagner
VGen
45
0
0
23 Aug 2023
Efficient Transfer Learning in Diffusion Models via Adversarial Noise
Efficient Transfer Learning in Diffusion Models via Adversarial Noise
Xiyu Wang
Baijiong Lin
Daochang Liu
Chang Xu
DiffM
99
3
0
23 Aug 2023
Example-Based Framework for Perceptually Guided Audio Texture Generation
Example-Based Framework for Perceptually Guided Audio Texture Generation
Purnima Kamath
Chitralekha Gupta
L. Wyse
Suranga Nanayakkara
48
4
0
23 Aug 2023
Complex-valued neural networks for voice anti-spoofing
Complex-valued neural networks for voice anti-spoofing
Nicolas Müller
Philip Sperl
Konstantin Böttinger
79
16
0
22 Aug 2023
MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive
  Learning with Omics-Inference Modeling
MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive Learning with Omics-Inference Modeling
Ziwei Yang
Zhengjun Chen
Yasuko Matsubara
Yasushi Sakurai
61
2
0
17 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
115
1
0
14 Aug 2023
Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled
  and Synthetic Data
Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled and Synthetic Data
Jérémy Cochoy
32
0
0
14 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using
  1D-2D CNN
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
82
5
0
14 Aug 2023
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic
  Talking-head Generation
Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation
Zhichao Wang
M. Dai
Keld Lundgaard
VGenDiffM
78
2
0
12 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
133
246
0
10 Aug 2023
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style
  Transfer
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Liyang Chen
Zhiyong Wu
Runnan Li
Weihong Bao
Jun Ling
Xuejiao Tan
Sheng Zhao
67
5
0
09 Aug 2023
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGenVLMDiffM
167
41
0
09 Aug 2023
Weakly Supervised Multi-Task Representation Learning for Human Activity
  Analysis Using Wearables
Weakly Supervised Multi-Task Representation Learning for Human Activity Analysis Using Wearables
Taoran Sheng
Manfred Huber
SSLHAI
65
21
0
06 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual
  Discriminators for High-Fidelity Multi-Speaker TTS
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
72
1
0
03 Aug 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
85
24
0
02 Aug 2023
Music De-limiter Networks via Sample-wise Gain Inversion
Music De-limiter Networks via Sample-wise Gain Inversion
Chang-Bin Jeon
Kyogu Lee
55
1
0
02 Aug 2023
A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle
  Trajectory and Driving Intention Prediction
A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction
Renteng Yuan
Mohamed Abdel-Aty
Q. Xiang
Zijin Wang
Xin Gu
90
10
0
01 Aug 2023
Generative models for wearables data
Generative models for wearables data
Arinbjorn Kolbeinsson
L. Foschini
MedIm
57
0
0
31 Jul 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGenDiffM
100
1
0
31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive
  Speech Synthesis with Prosody Conditional Adversarial Training
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
102
16
0
31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech
  with Adversarial Learning and Architecture Design
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
59
41
0
31 Jul 2023
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement
  Estimation in Conversation
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation
Vu Ngoc Tu
V. Huynh
Hyung-Jeong Yang
M. Zaheer
Shah Nawaz
Karthik Nandakumar
Soo-Hyung Kim
78
5
0
31 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
Jianwu Dang
DiffM
78
9
0
28 Jul 2023
Self-Supervised Visual Acoustic Matching
Self-Supervised Visual Acoustic Matching
Arjun Somayazulu
Changan Chen
Kristen Grauman
SSL
91
13
0
27 Jul 2023
CQNV: A combination of coarsely quantized bitstream and neural vocoder
  for low rate speech coding
CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Youqiang Zheng
Li Xiao
Weiping Tu
Yuhong Yang
Xinmeng Xu
105
6
0
25 Jul 2023
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic
  Spaces
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Iván Vallés-Pérez
Grzegorz Beringer
Piotr Bilinski
G. Cook
Roberto Barra-Chicote
58
1
0
23 Jul 2023
PartDiff: Image Super-resolution with Partial Diffusion Models
PartDiff: Image Super-resolution with Partial Diffusion Models
Kai Zhao
A. Hung
Kai-Lin Pang
Haoxin Zheng
Kyunghyun Sung
DiffMMedIm
58
3
0
21 Jul 2023
Learning minimal representations of stochastic processes with
  variational autoencoders
Learning minimal representations of stochastic processes with variational autoencoders
Gabriel Fernández-Fernández
Carlo Manzo
M. Lewenstein
A. Dauphin
Gorka Muñoz-Gil
DiffM
62
6
0
21 Jul 2023
Progressive distillation diffusion for raw music generation
Progressive distillation diffusion for raw music generation
Svetlana Pavlova
DiffM
67
0
0
20 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
79
2
0
20 Jul 2023
TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical
  Phase Recognition
TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition
Isabel Funke
Dominik Rivoir
Stefanie Krell
Stefanie Speidel
90
4
0
19 Jul 2023
Synthetic Lagrangian Turbulence by Generative Diffusion Models
Synthetic Lagrangian Turbulence by Generative Diffusion Models
Tianyi Li
Luca Biferale
F. Bonaccorso
M. A. Scarpolini
M. Buzzicotti
DiffM
74
44
0
17 Jul 2023
GBT: Two-stage transformer framework for non-stationary time series
  forecasting
GBT: Two-stage transformer framework for non-stationary time series forecasting
Li Shen
Yuning Wei
Yangzhu Wang
AI4TS
93
24
0
17 Jul 2023
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Tianyang Hu
Fei Chen
Hong Wang
Jiawei Li
Wei Cao
Jiacheng Sun
Zechao Li
DiffM
118
10
0
17 Jul 2023
Is Imitation All You Need? Generalized Decision-Making with Dual-Phase
  Training
Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training
Yao Wei
Yanchao Sun
Ruijie Zheng
Sai H. Vemprala
Rogerio Bonatti
Shuhang Chen
Ratnesh Madaan
Zhongjie Ba
Ashish Kapoor
Shuang Ma
OffRL
63
17
0
16 Jul 2023
Benchmarks and Custom Package for Electrical Load Forecasting
Benchmarks and Custom Package for Electrical Load Forecasting
Zhixian Wang
Qingsong Wen
Chaoli Zhang
Liang Sun
Leandro Von Krannichfeldt
Yi Wang
AI4TS
85
4
0
14 Jul 2023
Tapestry of Time and Actions: Modeling Human Activity Sequences using
  Temporal Point Process Flows
Tapestry of Time and Actions: Modeling Human Activity Sequences using Temporal Point Process Flows
Vinayak Gupta
Srikanta J. Bedathur
AI4TS
69
1
0
13 Jul 2023
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and
  Future Prospects
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects
Rishabh Ranjan
Mayank Vatsa
Richa Singh
74
4
0
13 Jul 2023
SnakeSynth: New Interactions for Generative Audio Synthesis
SnakeSynth: New Interactions for Generative Audio Synthesis
Eric Easthope
59
0
0
11 Jul 2023
DE-TGN: Uncertainty-Aware Human Motion Forecasting using Deep Ensembles
DE-TGN: Uncertainty-Aware Human Motion Forecasting using Deep Ensembles
Kareem A. Eltouny
Wansong Liu
Sibo Tian
Minghui Zheng
Xiao Liang
3DH
53
7
0
07 Jul 2023
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream
  Signal Bandwidth Regression on Digital Antenna Arrays
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays
R. Bhattacharjea
Nathan E. West
SSL
32
1
0
06 Jul 2023
On the Constrained Time-Series Generation Problem
On the Constrained Time-Series Generation Problem
Andrea Coletta
Sriram Gopalakrishnan
Daniel Borrajo
Svitlana Vyetrenko
DiffMAI4TS
112
40
0
04 Jul 2023
Disentanglement in a GAN for Unconditional Speech Synthesis
Disentanglement in a GAN for Unconditional Speech Synthesis
Matthew Baas
Herman Kamper
DiffM
87
4
0
04 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
46
6
0
03 Jul 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD
  Challenge 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Sheng Zhao
Qi-ping Yuan
Yibo Duan
Zhuo Chen
41
2
0
03 Jul 2023
Conformer LLMs -- Convolution Augmented Large Language Models
Conformer LLMs -- Convolution Augmented Large Language Models
Prateek Verma
62
1
0
02 Jul 2023
Improving the Transferability of Time Series Forecasting with
  Decomposition Adaptation
Improving the Transferability of Time Series Forecasting with Decomposition Adaptation
Yan-hong Gao
Yan Wang
Qiang Wang
AI4TS
54
0
0
30 Jun 2023
High-Quality Automatic Voice Over with Accurate Alignment: Supervision
  through Self-Supervised Discrete Speech Units
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units
Junchen Lu
Berrak Sisman
Mingyang Zhang
Haizhou Li
85
4
0
29 Jun 2023
Previous
123...111213...606162
Next