ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.06711
  4. Cited By
MelGAN: Generative Adversarial Networks for Conditional Waveform
  Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

8 October 2019
Kundan Kumar
Rithesh Kumar
T. Boissière
L. Gestin
Wei Zhen Teoh
Jose M. R. Sotelo
A. D. Brébisson
Yoshua Bengio
Aaron Courville
    GAN
ArXivPDFHTML

Papers citing "MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis"

50 / 208 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
20
0
0
14 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
Eng Siong Chng
45
0
0
12 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
42
3
0
12 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
Haozhe Zhang
...
Yichen Xiao
Yiying Zhou
Yujie Zhang
Yuan Lu
Yucen He
26
0
0
12 May 2025
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
48
0
0
27 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
62
0
0
07 Apr 2025
Less is More for Synthetic Speech Detection in the Wild
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
Shengshi Yao
Jincheng Dai
Xiaoqi Qin
Sixian Wang
Siye Wang
K. Niu
Ping Zhang
38
0
0
22 Jan 2025
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
14
0
01 Oct 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
56
3
0
23 Sep 2024
LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling
LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling
Yubo Huang
Xin Lai
Muyang Ye
Anran Zhu
Zixi Wang
Jingzehua Xu
Shuai Zhang
Zhiyuan Zhou
Weijie Niu
55
1
0
13 Sep 2024
InstructSing: High-Fidelity Singing Voice Generation via Instructing
  Yourself
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
Chang Zeng
Chunhui Wang
Xiaoxiao Miao
Jian Zhao
Zhonglin Jiang
Yong Chen
41
0
0
10 Sep 2024
Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities
Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities
Yidi Li
Yihan Li
Yixin Guo
Bin Ren
Zhenhuan Xu
Hao Guo
Hong Liu
N. Sebe
47
0
0
26 Aug 2024
DriftGAN: Using historical data for Unsupervised Recurring Drift
  Detection
DriftGAN: Using historical data for Unsupervised Recurring Drift Detection
Christofer Fellicious
Sahib Julka
Lorenz Wendlinger
Michael Granitzer
AI4TS
21
1
0
09 Jul 2024
Dataset-Distillation Generative Model for Speech Emotion Recognition
Dataset-Distillation Generative Model for Speech Emotion Recognition
Fabian Ritter-Gutierrez
Kuan Po Huang
Jeremy H. M Wong
Dianwen Ng
Hung-yi Lee
Nancy F. Chen
Eng Siong Chng
DD
44
0
0
05 Jun 2024
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight
  Text-to-Speech
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Ziqi Liang
Haoxiang Shi
Jiawei Wang
Keda Lu
43
0
0
13 Mar 2024
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a
  Diffusion Probabilistic Model
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
Keiichi Tokuda
DiffM
35
2
0
22 Feb 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
59
41
0
30 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Wang
Xin Li
Luisa Verdoliva
Shu Hu
88
58
0
22 Jan 2024
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
29
28
0
15 Dec 2023
An Initial Investigation of Neural Replay Simulator for Over-the-Air
  Adversarial Perturbations to Automatic Speaker Verification
An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification
Jiaqi Li
Li Wang
Liumeng Xue
Lei Wang
Zhizheng Wu
AAML
27
3
0
09 Oct 2023
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and
  Periodic Inductive Bias
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Sipan Li
Songxiang Liu
Lu Zhang
Xiang Li
Yanyao Bian
Chao Weng
Zhiyong Wu
Helen Meng
36
2
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit
  for Neural Speech Codec
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
34
54
0
14 Sep 2023
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in
  Music Audio Signals
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals
Yiming Wu
CoGe
DRL
32
0
0
06 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Jun Liu
78
31
0
27 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
DiffM
38
1
0
31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech
  with Adversarial Learning and Architecture Design
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
37
36
0
31 Jul 2023
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based
  onFlexible Location Gradient Reversal Layer
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer
Md. Asif Jalal
Pablo Peso Parada
Jisi Zhang
Karthikeyan P. Saravanan
Mete Ozay
Myoungji Han
Jung In Lee
Seokyeong Jung
28
1
0
25 Jul 2023
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony
  in Talking Head Generation
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
Louis Airale
Dominique Vaufreydaz
Xavier Alameda-Pineda
23
1
0
04 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
25
5
0
03 Jul 2023
Text-to-Speech Pipeline for Swiss German -- A comparison
Text-to-Speech Pipeline for Swiss German -- A comparison
Tobias Bollinger
Jan Deriu
Manfred Vogel
DiffM
26
0
0
31 May 2023
mdctGAN: Taming transformer-based GAN for speech super-resolution with
  Modified DCT spectra
mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra
Chenhao Shuai
Chaohua Shi
Lu Gan
Hongqing Liu
30
8
0
18 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net
  Encoder With Multiple STFTs
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Won Jang
D. Lim
Heayoung Park
27
1
0
18 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction
  of Amplitude and Phase Spectra
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
34
13
0
13 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
37
1
0
09 May 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High
  Fidelity Speech Synthesis
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
Ye-Xin Lu
Yang Ai
Zhenhua Ling
24
1
0
26 Apr 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Siwei Lyu
38
39
0
25 Apr 2023
Affective social anthropomorphic intelligent system
Affective social anthropomorphic intelligent system
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
17
1
0
19 Apr 2023
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive
  Structured Pruning
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Sung-Feng Huang
Chia-Ping Chen
Zhi-Sheng Chen
Yu-Pao Tsai
Hung-yi Lee
33
3
0
21 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
34
7
0
06 Mar 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
17
3
0
24 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Ehab AlBadawy
Siwei Lyu
130
3
0
18 Feb 2023
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech
  synthesis in Indian languages
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages
Sudhanshu Srivastava
Ishika Gupta
Anusha Prakash
Jom Kuriakose
H. Murthy
VLM
21
1
0
13 Feb 2023
Hypernetworks build Implicit Neural Representations of Sounds
Hypernetworks build Implicit Neural Representations of Sounds
Filip Szatkowski
Karol J. Piczak
Przemtslaw Spurek
Jacek Tabor
Tomasz Trzciñski
24
11
0
09 Feb 2023
Warning: Humans Cannot Reliably Detect Speech Deepfakes
Warning: Humans Cannot Reliably Detect Speech Deepfakes
Kimberly T. Mai
Sergi D. Bray
Toby O. Davies
Lewis D. Griffin
41
40
0
19 Jan 2023
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Kinyugo Maina
27
5
0
16 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to
  Speech
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
32
22
0
30 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
  Dataset
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
28
1
0
11 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
75
25
0
08 Dec 2022
Generative Models for Improved Naturalness, Intelligibility, and Voicing
  of Whispered Speech
Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech
Dominik Wagner
Sebastian P. Bayerl
H. A. C. Maruri
Tobias Bocklet
24
7
0
04 Dec 2022
12345
Next