ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.11567
  4. Cited By
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines

AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines

22 October 2020
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
ArXivPDFHTML

Papers citing "AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines"

50 / 122 papers shown
Title
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
32
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
44
0
0
01 May 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhengyuan Yang
Aoxiong Yin
Ruibin Yuan
Wenjie Qu
Zaida Zhou
AuLLM
VLM
110
5
0
25 Apr 2025
Protecting Your Voice: Temporal-aware Robust Watermarking
Protecting Your Voice: Temporal-aware Robust Watermarking
Yue Li
Weizhi Liu
Dongdong Lin
37
0
0
21 Apr 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xinbing Wang
Mingqi Jiang
Z. Ma
Ziyu Zhang
Shixuan Liu
...
Zhifei Li
Xie Chen
Lei Xie
Y. Guo
Wei Xue
84
13
0
03 Mar 2025
Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks
Chang-rui Liu
Haolin Wu
Xi Yang
Kui Zhang
Cong Wu
Wenbo Zhang
Nenghai Yu
Tianwei Zhang
Qing Guo
Jie Zhang
AAML
39
0
0
02 Mar 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
68
0
0
26 Feb 2025
Audio-FLAN: A Preliminary Release
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Yike Guo
Wei Xue
MLLM
AuLLM
CLIP
VLM
57
1
0
23 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Yiming Li
AuLLM
SyDa
VLM
107
0
0
18 Feb 2025
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Yassine El Kheir
Youness Samih
Suraj Maharjan
Tim Polzehl
Sebastian Möller
73
1
0
05 Feb 2025
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
Junan Zhang
Jing Yang
Zihao Fang
Yansen Wang
Zehua Zhang
Zhuo Wang
Fan Fan
Zhikai Wu
41
3
0
26 Jan 2025
Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments
Hanbin Bae
Byungjun Kang
Jiwon Kim
Jaeyong Hwang
Hosang Sung
Hoon-Young Cho
3DV
28
0
0
06 Jan 2025
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDa
BDL
AuLLM
VLM
58
11
0
23 Oct 2024
Optimizing Neural Speech Codec for Low-Bitrate Compression via
  Multi-Scale Encoding
Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
Peiji Yang
Fengping Wang
Yicheng Zhong
Huawei Wei
Zhisheng Wang
23
0
0
21 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
34
0
0
09 Oct 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffM
VLM
52
4
0
24 Sep 2024
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Xuanru Zhou
Cheol Jun Cho
Ayati Sharma
Brittany Morin
D. Baquirin
...
Zachary Miller
B. Tee
M. G. Tempini
Jiachen Lian
Gopala Anumanchipalli
34
3
0
15 Sep 2024
VoiceWukong: Benchmarking Deepfake Voice Detection
VoiceWukong: Benchmarking Deepfake Voice Detection
Ziwei Yan
Yanjie Zhao
Haoyu Wang
40
1
0
10 Sep 2024
PB-LRDWWS System for the SLT 2024 Low-Resource Dysarthria Wake-Up Word
  Spotting Challenge
PB-LRDWWS System for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge
Shiyao Wang
Jiaming Zhou
Shiwan Zhao
Yong Qin
40
1
0
07 Sep 2024
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural
  Language Description
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description
Zeyu Jin
Jia Jia
Qixin Wang
Kehan Li
Shuoyi Zhou
Songtao Zhou
Xiaoyu Qin
Zhiyong Wu
29
10
0
24 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for
  Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
41
0
0
11 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
52
1
0
09 Aug 2024
Generative Expressive Conversational Speech Synthesis
Generative Expressive Conversational Speech Synthesis
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
58
5
0
31 Jul 2024
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural
  Network
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
Kexin Wang
Jiahong Zhang
Yong Ren
Man Yao
Richard D. Shang
Boxing Xu
Guoqi Li
DiffM
31
2
0
17 Jul 2024
Target conversation extraction: Source separation using turn-taking
  dynamics
Target conversation extraction: Source separation using turn-taking dynamics
Tuochao Chen
Qirui Wang
Bohan Wu
Malek Itani
Sefik Emre Eskimez
Takuya Yoshioka
Shyamnath Gollakota
37
4
0
15 Jul 2024
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio
  Synthesis
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis
Weizhi Liu
Yue Li
Dongdong Lin
Hui Tian
Haizhou Li
WIGM
43
9
0
15 Jul 2024
An Unsupervised Domain Adaptation Method for Locating Manipulated Region
  in partially fake Audio
An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio
Siding Zeng
Jiangyan Yi
Jianhua Tao
Yujie Chen
Shan Liang
Yong Ren
Xiaohui Zhang
41
0
0
11 Jul 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in
  Self-Supervised Models
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
Jing Xu
Minglin Wu
Xixin Wu
Helen Meng
CLL
44
1
0
20 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
36
1
0
18 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
37
2
0
13 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
51
3
0
10 Jun 2024
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing
  Conversion
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Ruiqi Li
Rongjie Huang
Yongqi Wang
Zhiqing Hong
Zhou Zhao
42
1
0
04 Jun 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized
  Transformers
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
37
4
0
30 Apr 2024
Text-to-Song: Towards Controllable Music Generation Incorporating Vocals
  and Accompaniment
Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment
Zhiqing Hong
Rongjie Huang
Xize Cheng
Yongqi Wang
Ruiqi Li
Fuming You
Zhou Zhao
Zhimeng Zhang
34
7
0
14 Apr 2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
46
11
0
18 Mar 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
35
2
0
08 Mar 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
  Self-Supervised Representation Mixing and Embedding Initialization
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
31
0
0
23 Jan 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
  Zero-Shot Voice Conversion
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
26
6
0
19 Jan 2024
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
23
2
0
26 Oct 2023
PromptSpeaker: Speaker Generation Based on Text Descriptions
PromptSpeaker: Speaker Generation Based on Text Descriptions
Yongmao Zhang
Guanghou Liu
Yinjiao Lei
Yunlin Chen
Hao Yin
Lei Xie
Zhifei Li
25
11
0
08 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBM
AuLLM
28
115
0
01 Oct 2023
VoiceLens: Controllable Speaker Generation and Editing with Flow
VoiceLens: Controllable Speaker Generation and Editing with Flow
Yao Shi
Ming Li
BDL
32
1
0
25 Sep 2023
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech
  Data
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Jianwei Yu
Hangting Chen
Yanyao Bian
Xiang Li
Yimin Luo
Jinchuan Tian
Mengyang Liu
Jiayi Jiang
Shuai Wang
VLM
15
12
0
25 Sep 2023
Fewer-token Neural Speech Codec with Time-invariant Codes
Fewer-token Neural Speech Codec with Time-invariant Codes
Yong Ren
Tao Wang
Jiangyan Yi
Le Xu
Jianhua Tao
Chuyuan Zhang
Jun Zhou
22
33
0
15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic
  and acoustic features
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
21
3
0
15 Sep 2023
Timbre-reserved Adversarial Attack in Speaker Identification
Timbre-reserved Adversarial Attack in Speaker Identification
Qing Wang
Jixun Yao
Li Zhang
Pengcheng Guo
Linfu Xie
AAML
32
4
0
02 Sep 2023
Learning Speech Representation From Contrastive Token-Acoustic
  Pretraining
Learning Speech Representation From Contrastive Token-Acoustic Pretraining
Chunyu Qiang
Hao Li
Yixin Tian
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
29
5
0
01 Sep 2023
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
  Models
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Xin Zhang
Dong Zhang
Shimin Li
Yaqian Zhou
Xipeng Qiu
36
64
0
31 Aug 2023
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing
  Personalized TTS Systems for the Speech Impaired
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired
Jia-Jyu Su
Pang-Chen Liao
Yen-Ting Lin
Wu-Hao Li
Guan-Ting Liou
...
Wei-Cheng Chen
Jen-Chieh Chiang
Wen-Yang Chang
Pin-Han Lin
Chen-Yu Chiang
23
1
0
27 Aug 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
123
Next