Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.02882
Cited By
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
5 April 2019
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhehuai Chen
Yonghui Wu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech"
50 / 221 papers shown
Title
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
Hieu-Nghia Huynh-Nguyen
Ngoc Son Nguyen
Huynh Nguyen Dang
Thieu Vo
Truong-Son Hy
Van Nguyen
22
0
0
19 May 2025
Inference Attacks for X-Vector Speaker Anonymization
L. A. Bauer
Wenxuan Bao
Malvika Jadhav
Vincent Bindschaedler
32
0
0
13 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
Eng Siong Chng
47
0
0
12 May 2025
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Linrong Pan
Chenglong Jiang
Gaoze Hou
Ying Gao
48
0
0
08 May 2025
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
Zhongweiyang Xu
Xulin Fan
Zhong-Qiu Wang
Xilin Jiang
Romit Roy Choudhury
DiffM
57
0
0
08 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
Anton Van Den Hengel
Yuankai Qi
Qingming Huang
219
0
0
02 May 2025
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
Yue Li
Wen Liu
Dongdong Lin
44
0
0
29 Apr 2025
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zheng Yang
Aoxiong Yin
Ruibin Yuan
Wenjie Qu
Zaida Zhou
AuLLM
VLM
110
5
0
25 Apr 2025
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Scott Wellington
Xuechen Liu
Junichi Yamagishi
35
0
0
22 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
R. Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
221
0
0
12 Apr 2025
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
59
0
0
11 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
65
0
0
07 Apr 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan-Heng Lu
SSL
91
0
0
15 Mar 2025
The time scale of redundancy between prosody and linguistic context
Tamar I. Regev
Chiebuka Ohams
Shaylee Xie
Lukas Wolf
Evelina Fedorenko
Alex Warstadt
E. Wilcox
Tiago Pimentel
41
0
0
14 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
64
0
0
11 Mar 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
68
0
0
26 Feb 2025
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Xilin Jiang
Sukru Samet Dindar
Vishal B. Choudhari
Stephan Bickel
A. Mehta
Guy M McKhann
A. Flinker
D. Friedman
N. Mesgarani
39
2
0
24 Feb 2025
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
Wei Zhao
Pengxiang Ding
Hao Fei
Zhefei Gong
Shuanghao Bai
Han Zhao
Donglin Wang
93
6
0
24 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
63
3
0
07 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
62
1
0
05 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yali Wang
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
72
4
0
28 Jan 2025
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae
Anastasia Kuznetsova
Dinesh Manocha
John Hershey
Trausti Kristjansson
Minje Kim
77
0
0
23 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
66
0
0
15 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
90
3
0
03 Jan 2025
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
Tian-Hao Zhang
Jiawei Zhang
Jun Wang
Xinyuan Qian
Xu-cheng Yin
CVBM
54
0
0
02 Jan 2025
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
53
5
0
04 Nov 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
54
6
0
29 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
41
1
0
21 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
34
0
0
09 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
40
55
0
09 Oct 2024
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
17
0
01 Oct 2024
Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects
Annie Chu
P. O'Reilly
Julia Barnett
Bryan Pardo
CLIP
44
1
0
27 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
82
23
0
26 Sep 2024
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Kun Zhou
You Zhang
Shengkui Zhao
Hao Wang
Zexu Pan
...
Chongjia Ni
Yukun Ma
Trung Hieu Nguyen
J. Yip
Bin Ma
64
5
0
25 Sep 2024
Voice Conversion-based Privacy through Adversarial Information Hiding
J. Webber
O. Watts
G. Henter
Jennifer Williams
Simon King
45
0
0
23 Sep 2024
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
Hieu-Thi Luong
Haoyang Li
Lin Zhang
Kong Aik Lee
Eng Siong Chng
67
2
0
23 Sep 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS
Zhijun Liu
Shuai Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
VLM
DiffM
38
3
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
236
0
0
14 Sep 2024
OpenACE: An Open Benchmark for Evaluating Audio Coding Performance
Jozef Coldenhoff
Niclas Granqvist
Milos Cernak
33
0
0
12 Sep 2024
User-Driven Voice Generation and Editing through Latent Space Navigation
Yusheng Tian
Junbin Liu
Tan Lee
DiffM
48
2
0
30 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
60
36
0
29 Aug 2024
Advancing Spatio-Temporal Processing in Spiking Neural Networks through Adaptation
Maximilian Baronig
Romain Ferrand
Silvester Sabathiel
Robert Legenstein
53
3
0
14 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
37
4
0
12 Aug 2024
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Xiaoxiao Miao
Yuxiang Zhang
Xin Wang
N. Tomashenko
D. Soh
Ian Mcloughlin
42
2
0
12 Aug 2024
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Yali Wang
Xie Chen
AuLLM
55
24
0
05 Aug 2024
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
48
4
0
22 Jul 2024
Target conversation extraction: Source separation using turn-taking dynamics
Tuochao Chen
Qirui Wang
Bohan Wu
Malek Itani
Sefik Emre Eskimez
Takuya Yoshioka
Shyamnath Gollakota
45
4
0
15 Jul 2024
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis
Weizhi Liu
Yue Li
Dongdong Lin
Hui Tian
Haizhou Li
WIGM
43
9
0
15 Jul 2024
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
49
1
0
08 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
47
2
0
04 Jul 2024
1
2
3
4
5
Next