ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.06006
  4. Cited By
Neural Voice Cloning with a Few Samples

Neural Voice Cloning with a Few Samples

14 February 2018
Sercan Ö. Arik
Jitong Chen
Kainan Peng
Ming-Yu Liu
Yanqi Zhou
ArXivPDFHTML

Papers citing "Neural Voice Cloning with a Few Samples"

50 / 71 papers shown
Title
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Shuoyi Zhou
Yixuan Zhou
Weiqing Li
Jun Chen
Runchuan Ye
Weihao Wu
Zijian Lin
Shun Lei
Zhiyong Wu
107
1
0
02 Dec 2024
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
36
0
0
17 Sep 2024
User-Driven Voice Generation and Editing through Latent Space Navigation
User-Driven Voice Generation and Editing through Latent Space Navigation
Yusheng Tian
Junbin Liu
Tan Lee
DiffM
43
2
0
30 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
47
0
0
24 Jun 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody
  Modeling
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
46
2
0
09 Jun 2024
Towards the Development of a Real-Time Deepfake Audio Detection System
  in Communication Platforms
Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms
J. J. Mathew
Rakin Ahsan
Sae Furukawa
Jagdish Gautham Krishna Kumar
Huzaifa Pallan
Agamjeet Singh Padda
Sara Adamski
Madhu Reddiboina
Arjun Pankajakshan
29
2
0
18 Mar 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
59
41
0
30 Jan 2024
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive
  Structured Pruning
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Sung-Feng Huang
Chia-Ping Chen
Zhi-Sheng Chen
Yu-Pao Tsai
Hung-yi Lee
38
3
0
21 Mar 2023
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Suhee Jo
Younggun Lee
Yookyung Shin
Yeongtae Hwang
Taesu Kim
13
3
0
15 Mar 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
48
648
0
05 Jan 2023
Voice conversion with limited data and limitless data augmentations
Voice conversion with limited data and limitless data augmentations
Olga Slizovskaia
Jordi Janer
Pritish Chandna
Oscar Mayor
30
1
0
27 Dec 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New
  Speakers
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
41
18
0
01 Nov 2022
Towards zero-shot Text-based voice editing using acoustic context
  conditioning, utterance embeddings, and reference encoders
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
20
0
0
28 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
37
14
0
12 Oct 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural
  Text-to-Speech
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Yusuke Nakai
Yuki Saito
K. Udagawa
Hiroshi Saruwatari
AAML
25
1
0
26 Sep 2022
AutoLV: Automatic Lecture Video Generator
AutoLV: Automatic Lecture Video Generator
Wen Wang
Yang Song
Sanjay Jha
VGen
21
3
0
19 Sep 2022
On the Horizon: Interactive and Compositional Deepfakes
On the Horizon: Interactive and Compositional Deepfakes
Eric Horvitz
16
27
0
05 Sep 2022
Deepfake: Definitions, Performance Metrics and Standards, Datasets and
  Benchmarks, and a Meta-Review
Deepfake: Definitions, Performance Metrics and Standards, Datasets and Benchmarks, and a Meta-Review
Enes ALTUNCU
V. N. Franqueira
Shujun Li
28
11
0
21 Aug 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and
  Any-to-any Voice Conversion
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Yinjiao Lei
Shan Yang
Jian Cong
Linfu Xie
Dan Su
DiffM
52
12
0
05 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
68
0
0
28 Jun 2022
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Kun Song
Heyang Xue
Xinsheng Wang
Jian Cong
Yongmao Zhang
Linfu Xie
Bing Yang
Xiong Zhang
Dan Su
19
5
0
01 Jun 2022
Time Domain Adversarial Voice Conversion for ADD 2022
Time Domain Adversarial Voice Conversion for ADD 2022
Cheng Wen
Tingwei Guo
Xi Tan
Rui Yan
Shuran Zhou
Chuandong Xie
Wei Zou
Xiangang Li
18
4
0
19 Apr 2022
Self-supervised learning for robust voice cloning
Self-supervised learning for robust voice cloning
Konstantinos Klapsas
Nikolaos Ellinas
Karolos Nikitaras
G. Vamvoukakis
Panos Kakoulidis
...
S. Raptis
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
SSL
32
6
0
07 Apr 2022
Improve few-shot voice cloning using multi-modal learning
Improve few-shot voice cloning using multi-modal learning
Haitong Zhang
Yue Lin
21
8
0
18 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical
  Parametric Speech Synthesis
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis
Pengyu Cheng
Zhenhua Ling
33
3
0
02 Mar 2022
Voice Filter: Few-shot text-to-speech speaker adaptation using voice
  conversion as a post-processing module
Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Adam Gabry's
Goeric Huybrechts
M. Ribeiro
C. Chien
Julian Roth
Giulia Comini
Roberto Barra-Chicote
Bartek Perz
Jaime Lorenzo-Trueba
41
21
0
16 Feb 2022
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
23
0
25 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
18
56
0
07 Nov 2021
Cloning one's voice using very limited data in the wild
Cloning one's voice using very limited data in the wild
Dongyang Dai
Yuan-Jui Chen
Li Chen
Ming Tu
Lu Liu
Rui Xia
Qiao Tian
Yuping Wang
Yuxuan Wang
SyDa
27
9
0
07 Oct 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the
  Real World
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Emily Wenger
Max Bronckers
Christian Cianfarani
Jenna Cryan
Angela Sha
Haitao Zheng
Ben Y. Zhao
AAML
40
39
0
20 Sep 2021
Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal
  and Multimodal Detectors
Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors
Hasam Khalid
Minhan Kim
Shahroz Tariq
Simon S. Woo
23
82
0
07 Sep 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
47
6
0
16 Aug 2021
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
Hasam Khalid
Shahroz Tariq
Minha Kim
Simon S. Woo
36
185
0
11 Aug 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech
  Synthesis
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
31
36
0
29 Jun 2021
Learning to Compensate: A Deep Neural Network Framework for 5G Power
  Amplifier Compensation
Learning to Compensate: A Deep Neural Network Framework for 5G Power Amplifier Compensation
Po-Yu Chen
Hao-Wei Chen
Yi-Min Tsai
Hsien-Kai Kuo
Hantao Huang
Hsin-Hung Chen
Sheng-Hong Yan
Wei-Lun Ou
Chia-Ming Cheng
36
3
0
15 Jun 2021
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
Gal Greshler
Tamar Rott Shaham
T. Michaeli
18
25
0
11 Jun 2021
TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation
TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation
Haoang Chi
Feng Liu
Wenjing Yang
L. Lan
Tongliang Liu
Bo Han
William Cheung
James T. Kwok
35
27
0
11 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
25
160
0
06 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
18
55
0
24 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
37
188
0
01 Mar 2021
Fake Visual Content Detection Using Two-Stream Convolutional Neural
  Networks
Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks
B. Yousaf
Muhammad Usama
Waqas Sultani
Arif Mahmood
Junaid Qadir
17
8
0
03 Jan 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
29
5
0
14 Dec 2020
Facial Keypoint Sequence Generation from Audio
Facial Keypoint Sequence Generation from Audio
Prateek Manocha
Prithwijit Guha
3DH
VGen
23
0
0
02 Nov 2020
A Cross-Verification Approach for Protecting World Leaders from Fake and
  Tampered Audio
A Cross-Verification Approach for Protecting World Leaders from Fake and Tampered Audio
Mengyi Shan
T. Tsai
16
8
0
23 Oct 2020
Neural voice cloning with a few low-quality samples
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
33
2
0
12 Jun 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based
  Variable-Length Embedding
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
37
65
0
18 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep
  Transfer with Feedback Constraint
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint
Zexin Cai
Chuxiong Zhang
Ming Li
24
41
0
10 May 2020
12
Next