Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.07837
Cited By
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
22 December 2016
Soroush Mehri
Kundan Kumar
Ishaan Gulrajani
Rithesh Kumar
Shubham Jain
Jose M. R. Sotelo
Aaron Courville
Yoshua Bengio
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SampleRNN: An Unconditional End-to-End Neural Audio Generation Model"
50 / 274 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
20
0
0
14 May 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
W. Dong
Changsheng Xu
69
0
0
17 Apr 2025
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models
Tornike Karchkhadze
M. Izadi
Shlomo Dubnov
DiffM
44
2
0
31 Dec 2024
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha
Yapeng Tian
DiffM
VGen
87
2
0
14 Dec 2024
Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise
Tornike Karchkhadze
Keren Shao
Shlomo Dubnov
75
0
0
12 Dec 2024
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
Xiao-Hang Jiang
Hui-Peng Du
Yang Ai
Ye-Xin Lu
Zhen-Hua Ling
30
0
0
18 Nov 2024
Sing-On-Your-Beat: Simple Text-Controllable Accompaniment Generations
Quoc-Huy Trinh
Minh-Van Nguyen
Trong-Hieu Nguyen-Mau
Khoa Tran
Thanh Do
35
0
0
03 Nov 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
36
6
0
27 Sep 2024
MambaFoley: Foley Sound Generation using Selective State-Space Models
Marco Furio Colombo
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
Mamba
25
1
0
13 Sep 2024
Advancing Spatio-Temporal Processing in Spiking Neural Networks through Adaptation
Maximilian Baronig
Romain Ferrand
Silvester Sabathiel
Robert Legenstein
48
3
0
14 Aug 2024
Combining audio control and style transfer using latent diffusion
Andreas Maier
Yuliya Burankova
Anne Hartebrodt
David B. Blumenthal
DiffM
34
2
0
31 Jul 2024
Synthetic Trajectory Generation Through Convolutional Neural Networks
Jesse Merhi
Erik Buchholz
S. Kanhere
37
0
0
24 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffM
VGen
47
15
0
15 Jul 2024
MusicScore: A Dataset for Music Score Modeling and Generation
Yuheng Lin
Zheqi Dai
Qiuqiang Kong
VLM
37
2
0
17 Jun 2024
Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models
J. Nistal
Marco Pasini
Cyran Aouameur
M. Grachten
Stefan Lattner
DiffM
53
16
0
12 Jun 2024
Robust Multi-Modal Speech In-Painting: A Sequence-to-Sequence Approach
Mahsa Kadkhodaei Elyaderani
Shahram Shirani
34
0
0
02 Jun 2024
Creative Text-to-Audio Generation via Synthesizer Programming
Manuel Cherep
Nikhil Singh
Jessica Shand
25
3
0
01 Jun 2024
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
56
0
0
31 May 2024
AFEN: Respiratory Disease Classification using Ensemble Learning
Rahul Nadkarni
Emmanouil Nikolakakis
Razvan Marinescu
11
0
0
08 May 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
37
4
0
30 Apr 2024
Long-form music generation with latent diffusion
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
MGen
DiffM
44
39
0
16 Apr 2024
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
Keiichi Tokuda
DiffM
35
2
0
22 Feb 2024
An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation
Xin Jin
Wu Zhou
Jinyu Wang
Duo Xu
Yongsen Zheng
33
1
0
13 Feb 2024
Bass Accompaniment Generation via Latent Diffusion
Marco Pasini
M. Grachten
Stefan Lattner
59
11
0
02 Feb 2024
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
Prabhav Agrawal
Thilo Köhler
Zhiping Xiu
Prashant Serai
Qing He
26
1
0
19 Jan 2024
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Tan Dat Nguyen
Ji-Hoon Kim
Youngjoon Jang
Jaehun Kim
Joon Son Chung
DiffM
41
5
0
18 Jan 2024
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Ge Zhu
Yutong Wen
M. Carbonneau
Zhiyao Duan
DiffM
48
7
0
15 Nov 2023
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling
Tiberiu Boros
Stefan Daniel Dumitrescu
Ionut Mironica
Radu Chivereanu
GAN
14
1
0
14 Oct 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
Yuchen Liu
Apu Kapadia
Donald Williamson
AAML
41
0
0
26 Sep 2023
Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer
Peter Ochieng
DiffM
22
0
0
18 Sep 2023
DDSP-SFX: Acoustically-guided sound effects generation with differentiable digital signal processing
Yunyi Liu
Craig Jin
David Gunawan
16
2
0
14 Sep 2023
Timbre-reserved Adversarial Attack in Speaker Identification
Qing Wang
Jixun Yao
Li Zhang
Pengcheng Guo
Linfu Xie
AAML
32
4
0
02 Sep 2023
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
B. Hayes
Jordie Shier
Gyorgy Fazekas
Andrew Mcpherson
C. Saitis
27
21
0
29 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Xiaozhong Liu
78
31
0
27 Aug 2023
An Initial Exploration: Learning to Generate Realistic Audio for Silent Video
Matthew Martel
Jack Wagner
VGen
16
0
0
23 Aug 2023
CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Youqiang Zheng
Li Xiao
Weiping Tu
Yuhong Yang
Xinmeng Xu
41
6
0
25 Jul 2023
Progressive distillation diffusion for raw music generation
Svetlana Pavlova
DiffM
23
0
0
20 Jul 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
56
288
0
11 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
25
79
0
01 Jun 2023
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation
Zhe Ye
Wei Xue
Xuejiao Tan
Qi-fei Liu
Yi-Ting Guo
26
2
0
22 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
34
13
0
13 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
34
1
0
09 May 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
Ye-Xin Lu
Yang Ai
Zhenhua Ling
24
1
0
26 Apr 2023
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
24
4
0
03 Mar 2023
Continuous descriptor-based control for deep audio synthesis
Ninon Devis
Nils Demerlé
Sarah Nabi
David Genova
P. Esling
22
9
0
27 Feb 2023
Hypernetworks build Implicit Neural Representations of Sounds
Filip Szatkowski
Karol J. Piczak
Przemtslaw Spurek
Jacek Tabor
Tomasz Trzciñski
24
11
0
09 Feb 2023
ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
Peng Fei Zhu
Chao Pang
Yekun Chai
Lei Li
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
DiffM
11
20
0
09 Feb 2023
SingSong: Generating musical accompaniments from singing
Chris Donahue
Antoine Caillon
Adam Roberts
Ethan Manilow
P. Esling
...
Mauro Verzetti
Ian Simon
Olivier Pietquin
Neil Zeghidour
Jesse Engel
37
52
0
30 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
32
22
0
30 Dec 2022
1
2
3
4
5
6
Next