Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.08435
Cited By
v1
v2 (latest)
Efficient Neural Audio Synthesis
23 February 2018
Nal Kalchbrenner
Erich Elsen
Karen Simonyan
Seb Noury
Norman Casagrande
Edward Lockhart
Florian Stimberg
Aaron van den Oord
Sander Dieleman
Koray Kavukcuoglu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Efficient Neural Audio Synthesis"
50 / 469 papers shown
Title
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
Yunkee Chae
Kyogu Lee
26
0
0
19 Jun 2025
SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
Yicheng Gu
Chaoren Wang
Jing Zhang
Xueyao Zhang
Zihao Fang
Haorui He
Zhizheng Wu
79
3
0
14 May 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
56
0
0
14 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
Haozhe Zhang
...
Yichen Xiao
Yiying Zhou
Yize Zhang
Yuan Lu
Yucen He
70
1
0
12 May 2025
SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling
Yaofei Wang
Gang Pei
Kejiang Chen
Jinyang Ding
Chao Pan
Weilong Pang
Donghui Hu
Weinan Zhang
79
2
0
25 Mar 2025
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
Tianze Luo
Xingchen Miao
Wenbo Duan
DiffM
93
0
0
20 Mar 2025
FlowDec: A flow-based full-band general audio codec with high perceptual quality
Simon Welker
Matthew Le
Ricky T. Q. Chen
Wei-Ning Hsu
Timo Gerkmann
Alexander Richard
Yi-Chiao Wu
98
1
0
03 Mar 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
105
0
0
26 Feb 2025
Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs
Minje Kim
Jan Skoglund
97
2
0
08 Jan 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
121
6
0
26 Dec 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
59
3
0
14 Nov 2024
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
Tomoki Toda
72
2
0
11 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Yanjie Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
75
17
0
02 Nov 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
127
9
0
29 Oct 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
131
5
0
23 Sep 2024
BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation
Seyed Rohollah Hosseyni
Ali Ahmad Rahmani
S. J. Seyedmohammadi
Sanaz Seyedin
Arash Mohammadi
DiffM
93
7
0
17 Sep 2024
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation
Ye Bai
Haonan Chen
Jitong Chen
Zhuo Chen
Yi Deng
...
Hang Zhao
Ziyi Zhao
Dejian Zhong
Shicen Zhou
Pei Zou
DiffM
104
8
0
13 Sep 2024
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
Tianchi Liu
Ivan Kukanov
Zihan Pan
Qiongqiong Wang
Hardik B. Sailor
K. Lee
110
2
0
12 Sep 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
61
0
0
13 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
97
5
0
09 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
136
0
0
06 Aug 2024
LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake Generation
Dwij Mehta
Aditya Mehta
Pratik Narang
DiffM
76
0
0
04 Aug 2024
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffM
VGen
95
16
0
15 Jul 2024
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
Hyunjae Cho
Junhyeok Lee
Wonbin Jung
51
0
0
10 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
89
8
0
07 Jun 2024
Creative Text-to-Audio Generation via Synthesizer Programming
Manuel Cherep
Nikhil Singh
Jessica Shand
81
4
0
01 Jun 2024
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Nolan Dey
Shane Bergsma
Joel Hestness
75
5
0
24 May 2024
HILCodec: High Fidelity and Lightweight Neural Audio Codec
S. Ahn
Beom Jun Woo
Mingrui Han
Chanyeong Moon
Nam Soo Kim
48
9
0
08 May 2024
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Pengcheng Li
Jianzong Wang
Xulong Zhang
Yong Zhang
Jing Xiao
Ning Cheng
DRL
77
2
0
02 May 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
102
4
0
30 Apr 2024
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
Yicheng Gu
Xueyao Zhang
Liumeng Xue
Haizhou Li
Zhizheng Wu
55
3
0
26 Apr 2024
Decoupled Weight Decay for Any
p
p
p
Norm
N. Outmezguine
Noam Levi
86
3
0
16 Apr 2024
Personalized Neural Speech Codec
Inseon Jang
Haici Yang
Wootaek Lim
Seung-Wha Beack
Minje Kim
73
1
0
31 Mar 2024
Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
57
0
0
25 Mar 2024
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Ziqi Liang
Haoxiang Shi
Jiawei Wang
Keda Lu
77
0
0
13 Mar 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
144
3
0
08 Mar 2024
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
Keiichi Tokuda
DiffM
67
3
0
22 Feb 2024
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Shijia Liao
Shiyi Lan
Arun George Zachariah
45
1
0
31 Jan 2024
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
Prabhav Agrawal
Thilo Köhler
Zhiping Xiu
Prashant Serai
Qing He
37
1
0
19 Jan 2024
Always-Sparse Training by Growing Connections with Guided Stochastic Exploration
Mike Heddes
Narayan Srinivasa
T. Givargis
Alexandru Nicolau
273
0
0
12 Jan 2024
Incremental FastPitch: Chunk-based High Quality Text to Speech
Muyang Du
Chuan Liu
Junjie Lai
53
0
0
03 Jan 2024
SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting
Shane Bergsma
Timothy J. Zeyl
Lei Guo
AI4TS
98
3
0
22 Dec 2023
C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting
Shane Bergsma
Timothy J. Zeyl
J. R. Anaraki
Lei Guo
BDL
AI4TS
91
12
0
22 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
91
35
0
15 Dec 2023
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism
Georgios Milis
P. Filntisis
A. Roussos
Petros Maragos
CVBM
66
3
0
11 Dec 2023
Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder
Yicheng Gu
Xueyao Zhang
Liumeng Xue
Zhizheng Wu
74
12
0
25 Nov 2023
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion
A. R. Bargum
Stefania Serafin
Cumhur Erkut
73
4
0
14 Nov 2023
Music ControlNet: Multiple Time-varying Controls for Music Generation
Shih-Lun Wu
Chris Donahue
Shinji Watanabe
Nicholas J. Bryan
DiffM
MGen
111
61
0
13 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
100
30
0
10 Nov 2023
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Muhammad Ali Farooq
Dan Bigioi
Rishabh Jain
Wang Yao
Mariam Yiwere
Peter Corcoran
86
0
0
08 Nov 2023
1
2
3
4
...
8
9
10
Next