ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.11567
  4. Cited By
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines

AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines

22 October 2020
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
ArXivPDFHTML

Papers citing "AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines"

50 / 122 papers shown
Title
An End-to-End Multi-Module Audio Deepfake Generation System for ADD
  Challenge 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Sheng Zhao
Qi-ping Yuan
Yibo Duan
Zhuo Chen
19
2
0
03 Jul 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
32
6
0
25 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive
  Bias
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
32
73
0
06 Jun 2023
VILAS: Exploring the Effects of Vision and Language Context in Automatic
  Speech Recognition
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition
Ziyi Ni
Minglun Han
Feilong Chen
Linghui Meng
Jing Shi
Shuang Xu
Bo Xu
42
0
0
31 May 2023
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial
  Attack in Speaker Identification
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification
Qing Wang
Jixun Yao
Ziqian Wang
Pengcheng Guo
Linfu Xie
AAML
29
1
0
30 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in
  End-to-End Zero-Shot Speech Synthesis
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
42
1
0
26 May 2023
ADD 2023: the Second Audio Deepfake Detection Challenge
ADD 2023: the Second Audio Deepfake Detection Challenge
Jiangyan Yi
Jianhua Tao
Ruibo Fu
Xinrui Yan
Chenglong Wang
...
Zhengqi Wen
Shan Liang
Zheng Lian
Shuai Nie
Haizhou Li
90
95
0
23 May 2023
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource
  Scenarios
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios
Yuyue Wang
Huanhou Xiao
Yihan Wu
Ruihua Song
23
0
0
20 May 2023
Parameter-Efficient Learning for Text-to-Speech Accent Adaptation
Parameter-Efficient Learning for Text-to-Speech Accent Adaptation
Lijie Yang
Chao-Han Huck Yang
Jen-Tzung Chien
22
11
0
18 May 2023
Adversarial Speaker Disentanglement Using Unannotated External Data for
  Self-supervised Representation Based Voice Conversion
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Xintao Zhao
Shuai Wang
Yang Chao
Zhiyong Wu
Helen Meng
35
3
0
16 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating
  Multi-Modalities as Foreign Languages
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
41
115
0
07 May 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
31
85
0
31 Jan 2023
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
  Dataset
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
28
1
0
11 Dec 2022
IMaSC -- ICFOSS Malayalam Speech Corpus
IMaSC -- ICFOSS Malayalam Speech Corpus
D. Gopinath
K. ThennalD
Vrinda V. Nair
S. SwarajK
G. Sachin
AuLLM
28
1
0
23 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by
  time-frequency domain supervision from DSP
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
61
15
0
02 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
27
0
0
31 Oct 2022
SAN: a robust end-to-end ASR model architecture
SAN: a robust end-to-end ASR model architecture
Zeping Min
Qian Ge
Guanhua Huang
24
2
0
27 Oct 2022
Streaming Voice Conversion Via Intermediate Bottleneck Features And
  Non-streaming Teacher Guidance
Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Yuan-Jui Chen
Ming Tu
Tang-Chun Li
Xin Li
Qiuqiang Kong
Jiaxin Li
Zhichao Wang
Qiao Tian
Yuping Wang
Yuxuan Wang
37
11
0
27 Oct 2022
10 hours data is all you need
10 hours data is all you need
Zeping Min
Qian Ge
Zhong Li
26
2
0
24 Oct 2022
spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid
  filtering for multi-channel speech enhancement
spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement
Shubo Lv
Yihui Fu
Yukai Jv
Linfu Xie
Weixin Zhu
Wei Rao
Yannan Wang
19
8
0
17 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
On the Utility of Self-supervised Models for Prosody-related Tasks
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Nigel G. Ward
23
47
0
13 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Yangfu Li
Xiaodan Lin
Jiaxin Yang
19
0
0
06 Oct 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and
  Accompanied Baseline
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Yifan Hu
Pengkai Yin
Rui Liu
F. Bao
Guanglai Gao
18
5
0
22 Sep 2022
Open Challenges in Synthetic Speech Detection
Open Challenges in Synthetic Speech Detection
Luca Cuccovillo
Christoforos Papastergiopoulos
Anastasios Vafeiadis
Artem Yaroshchuk
P. Aichroth
K. Votis
Dimitrios Tzovaras
46
27
0
15 Sep 2022
Subband-based Generative Adversarial Network for Non-parallel
  Many-to-many Voice Conversion
Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion
Jianchun Ma
Zhedong Zheng
Hao Fei
Feng Zheng
Tat-Seng Chua
Yi Yang
GAN
37
0
0
13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection
CFAD: A Chinese Dataset for Fake Audio Detection
Haoxin Ma
Jiangyan Yi
Chenglong Wang
Xin Yan
J. Tao
Tao Wang
Shiming Wang
Ruibo Fu
24
26
0
12 Jul 2022
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Wei-Ping Huang
Po-Chun Chen
Sung-Feng Huang
Hung-yi Lee
24
1
0
27 Jun 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
31
24
0
20 May 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one
  voice conversion
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion
Weida Liang
Lantian Li
Wenqiang Du
Dong Wang
48
0
0
08 Apr 2022
Heterogeneous Target Speech Separation
Heterogeneous Target Speech Separation
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
46
26
0
07 Apr 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker
  Adaptation in Text-to-Speech Synthesis
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Xiang Li
Lu Zhang
Zhiyong Wu
Yanyao Bian
Dan Su
Helen Meng
26
22
0
03 Apr 2022
Analyzing Language-Independent Speaker Anonymization Framework under
  Unseen Conditions
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
N. Tomashenko
27
10
0
28 Mar 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid
  ASR Bottleneck Features for Voice Conversion
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Xintao Zhao
Feng Liu
Changhe Song
Zhiyong Wu
Shiyin Kang
Deyi Tuo
Helen Meng
16
20
0
24 Mar 2022
AdaVocoder: Adaptive Vocoder for Custom Voice
AdaVocoder: Adaptive Vocoder for Custom Voice
Xin Yuan
Yongbin Feng
Mingming Ye
Cheng Tuo
Minghang Zhang
17
3
0
18 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning
Variational Auto-Encoder based Mandarin Speech Cloning
Qingyu Xing
Xiaohan Ma
21
0
0
06 Mar 2022
The Vicomtech Audio Deepfake Detection System based on Wav2Vec2 for the
  2022 ADD Challenge
The Vicomtech Audio Deepfake Detection System based on Wav2Vec2 for the 2022 ADD Challenge
Juan M. Martín-Donas
Aitor Álvarez
35
98
0
03 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical
  Parametric Speech Synthesis
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis
Pengyu Cheng
Zhenhua Ling
28
3
0
02 Mar 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Jinglin Liu
Chengxi Li
Yi Ren
Zhiying Zhu
Zhou Zhao
DiffM
33
14
0
27 Feb 2022
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for
  Zero-shot Multi-speaker Text-to-Speech
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech
Bo Zhao
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
DiffM
18
22
0
22 Feb 2022
ADD 2022: the First Audio Deep Synthesis Detection Challenge
ADD 2022: the First Audio Deep Synthesis Detection Challenge
Jiangyan Yi
Ruibo Fu
J. Tao
Shuai Nie
Haoxin Ma
...
Le Xu
Zhengqi Wen
Haizhou Li
Zheng Lian
Bin Liu
14
175
0
17 Feb 2022
Partially Fake Audio Detection by Self-attention-based Fake Span
  Discovery
Partially Fake Audio Detection by Self-attention-based Fake Span Discovery
Haibin Wu
Heng-Cheng Kuo
Naijun Zheng
Kuo-Hsuan Hung
Hung-yi Lee
Yu Tsao
Hsin-Min Wang
Helen Meng
35
36
0
14 Feb 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP
  ADD Challenge
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge
Ziyi Chen
Hua Hua
Yuxiang Zhang
Ming Li
Pengyuan Zhang
27
0
0
29 Jan 2022
MHTTS: Fast multi-head text-to-speech for spontaneous speech with
  imperfect transcription
MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription
Dabiao Ma
Yitong Zhang
Meng Li
Feng Ye
14
1
0
19 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for
  Singing Voice Synthesis
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Yu Wang
Xinsheng Wang
Pengcheng Zhu
Jie Wu
Hanzhao Li
Heyang Xue
Yongmao Zhang
Lei Xie
Mengxiao Bi
25
95
0
19 Jan 2022
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data,
  Speakers, and Topics
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics
Saida Mussakhojayeva
Yerbolat Khassanov
H. A. Varol
29
13
0
15 Jan 2022
MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder
MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder
Shoutong Wang
Jinglin Liu
Yi Ren
Zhen Wang
Changliang Xu
Zhou Zhao
25
7
0
11 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech
  representation for expressive voice conversion
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion
Wendong Gan
Bolong Wen
Yin Yan
Haitao Chen
Zhichao Wang
Hongqiang Du
Lei Xie
Kaixuan Guo
Hai Li
15
14
0
02 Jan 2022
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale
  Corpus
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus
Rongjie Huang
Feiyang Chen
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
33
98
0
20 Dec 2021
How Speech is Recognized to Be Emotional - A Study Based on Information
  Decomposition
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition
Haoran Sun
Lantian Li
T. Zheng
Dong Wang
CVBM
19
0
0
24 Nov 2021
Integrated Semantic and Phonetic Post-correction for Chinese Speech
  Recognition
Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition
Yi-Chang Chen
Chun-Yen Cheng
Chien-An Chen
Ming-Chieh Sung
Yi-Ren Yeh
13
6
0
16 Nov 2021
Previous
123
Next