ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.08435
  4. Cited By
Efficient Neural Audio Synthesis

Efficient Neural Audio Synthesis

23 February 2018
Nal Kalchbrenner
Erich Elsen
Karen Simonyan
Seb Noury
Norman Casagrande
Edward Lockhart
Florian Stimberg
Aaron van den Oord
Sander Dieleman
Koray Kavukcuoglu
ArXivPDFHTML

Papers citing "Efficient Neural Audio Synthesis"

50 / 472 papers shown
Title
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
36
27
0
02 Nov 2023
An overview of text-to-speech systems and media applications
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
15
3
0
22 Oct 2023
Generative Adversarial Training for Text-to-Speech Synthesis Based on
  Raw Phonetic Input and Explicit Prosody Modelling
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling
Tiberiu Boros
Stefan Daniel Dumitrescu
Ionut Mironica
Radu Chivereanu
GAN
22
1
0
14 Oct 2023
Deepfake audio as a data augmentation technique for training automatic
  speech to text transcription models
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Alexandre R. Ferreira
Cláudio E. C. Campelo
10
1
0
22 Sep 2023
SpeechAlign: a Framework for Speech Translation Alignment Evaluation
SpeechAlign: a Framework for Speech Translation Alignment Evaluation
Belen Alastruey
Aleix Sant
Gerard I. Gállego
David Dale
Marta R. Costa-jussá
AuLLM
33
3
0
20 Sep 2023
Speeding Up Speech Synthesis In Diffusion Models By Reducing Data
  Distribution Recovery Steps Via Content Transfer
Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer
Peter Ochieng
DiffM
30
0
0
18 Sep 2023
A Two-Stage Training Framework for Joint Speech Compression and
  Enhancement
A Two-Stage Training Framework for Joint Speech Compression and Enhancement
Jiayi Huang
Zeyu Yan
Wenbin Jiang
Fei Wen
27
0
0
08 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial
  Network
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
23
11
0
06 Sep 2023
Voice Morphing: Two Identities in One Voice
Voice Morphing: Two Identities in One Voice
Sushant Pani
Anurag Chowdhury
Morgan Sandler
Arun Ross
24
1
0
05 Sep 2023
A Review of Differentiable Digital Signal Processing for Music & Speech
  Synthesis
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
B. Hayes
Jordie Shier
Gyorgy Fazekas
Andrew Mcpherson
C. Saitis
27
21
0
29 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using
  1D-2D CNN
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
38
4
0
14 Aug 2023
Neural Networks at a Fraction with Pruned Quaternions
Neural Networks at a Fraction with Pruned Quaternions
Sahel Mohammad Iqbal
Subhankar Mishra
28
4
0
13 Aug 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech
  with Adversarial Learning and Architecture Design
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
37
36
0
31 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
33
137
0
20 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
27
5
0
03 Jul 2023
MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial
  Learning
MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning
Mohammad Reza Hasanabadi
19
3
0
22 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
33
11
0
05 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
14
2
0
02 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural
  vocoders for high-quality audio synthesis
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
32
79
0
01 Jun 2023
Towards Selection of Text-to-speech Data to Augment ASR Training
Towards Selection of Text-to-speech Data to Augment ASR Training
Shuo Liu
Leda Sari
Chunyang Wu
Gil Keren
Yuan Shangguan
Jay Mahadeokar
Ozlem Kalinli
16
4
0
30 May 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
48
66
0
30 May 2023
Translatotron 3: Speech to Speech Translation with Monolingual Data
Translatotron 3: Speech to Speech Translation with Monolingual Data
Eliya Nachmani
Alon Levkovitch
Yi-Yang Ding
Chulayutsh Asawaroengchai
Heiga Zen
Michelle Tadmor Ramanovich
33
14
0
27 May 2023
Efficient Neural Music Generation
Efficient Neural Music Generation
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffM
MGen
34
49
0
25 May 2023
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound
  Synthesis based on Frequency Modulation
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation
Zhe Ye
Wei Xue
Xuejiao Tan
Qi-fei Liu
Yi-Ting Guo
26
2
0
22 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction
  of Amplitude and Phase Spectra
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
39
13
0
13 May 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
32
84
0
12 May 2023
JaxPruner: A concise library for sparsity research
JaxPruner: A concise library for sparsity research
Jooyoung Lee
Wonpyo Park
Nicole Mitchell
Jonathan Pilault
J. Obando-Ceron
...
Hong-Seok Kim
Yann N. Dauphin
Karolina Dziugaite
Pablo Samuel Castro
Utku Evci
46
14
0
27 Apr 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High
  Fidelity Speech Synthesis
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
Ye-Xin Lu
Yang Ai
Zhenhua Ling
24
1
0
26 Apr 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Siwei Lyu
38
39
0
25 Apr 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End
  Speech Model
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
Jianzong Wang
Xulong Zhang
Haobin Tang
Aolan Sun
Ning Cheng
Jing Xiao
26
1
0
23 Apr 2023
AUDIT: Audio Editing by Following Instructions with Latent Diffusion
  Models
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
Yuancheng Wang
Zeqian Ju
Xuejiao Tan
Lei He
Zhizheng Wu
Jiang Bian
Sheng Zhao
DiffM
19
47
0
03 Apr 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for
  Generative Adversarial Network-Based Speech Synthesis
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
34
9
0
24 Mar 2023
Controllable Prosody Generation With Partial Inputs
Controllable Prosody Generation With Partial Inputs
Dan-Andrei Iliescu
D. Mohan
Tian Huey Teh
Zack Hodari
30
1
0
14 Mar 2023
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
Weixin Chen
D. Song
Bo-wen Li
DiffM
37
74
0
10 Mar 2023
Scaling strategies for on-device low-complexity source separation with
  Conv-Tasnet
Scaling strategies for on-device low-complexity source separation with Conv-Tasnet
Mohamed Nabih Ali
Francesco Paissan
Daniele Falavigna
Alessio Brutti
23
2
0
06 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
34
7
0
06 Mar 2023
A General Framework for Learning Procedural Audio Models of
  Environmental Sounds
A General Framework for Learning Procedural Audio Models of Environmental Sounds
Danzel Serrano
M. Cartwright
DiffM
DRL
35
1
0
04 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised
  Speech and Text Representations
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
39
22
0
03 Mar 2023
Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants
Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants
Domna Bilika
Nikoletta Michopoulou
E. Alepis
Constantinos Patsakis
33
8
0
20 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Ehab AlBadawy
Siwei Lyu
130
3
0
18 Feb 2023
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving
  Source Separation
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation
Shahar Lutati
Eliya Nachmani
Lior Wolf
DiffM
36
14
0
25 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for
  Synthetic Emotional Speech Augmentation
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation
Abdullah Shahid
S. Latif
Junaid Qadir
31
23
0
10 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to
  Speech
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
37
22
0
30 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
  Dataset
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
28
1
0
11 Dec 2022
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with
  Very Low Computational Complexity
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
Ahmed Mustafa
J. Valin
Jan Büthe
Paris Smaragdis
Mike Goodwin
30
4
0
08 Dec 2022
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice
  Source Features
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features
T. U. K. Reddy
Sahukari Chaitanya Varun
Kota Pranav Kumar Sankala Sreekanth
K. Murty
23
0
0
05 Dec 2022
Are Straight-Through gradients and Soft-Thresholding all you need for
  Sparse Training?
Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?
A. Vanderschueren
Christophe De Vleeschouwer
MQ
25
9
0
02 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
25
103
0
29 Nov 2022
Efficient Incremental Text-to-Speech on GPUs
Efficient Incremental Text-to-Speech on GPUs
Muyang Du
Chuan Liu
Jiaxing Qi
Junjie Lai
24
1
0
25 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech
  Representation using Differentiable Digital Signal Processing
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
J. Webber
Cassia Valentini-Botinhao
Evelyn Williams
G. Henter
Simon King
16
9
0
13 Nov 2022
Previous
12345...8910
Next