ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.14321
  4. Cited By
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

25 January 2024
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
    VLM
ArXivPDFHTML

Papers citing "VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech"

18 / 18 papers shown
Title
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
44
0
0
01 May 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
S. Liu
J. Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
27
0
0
14 Apr 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
41
0
0
10 Jan 2025
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Chenyu Yang
Shuai Wang
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Yongjun Xu
Yizhi Zhou
Haina Zhu
Hao Li
KELM
176
1
0
18 Dec 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
36
5
0
29 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction
  and Speculative Decoding
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
37
0
0
17 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow
  Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
33
52
0
09 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence
  Alignment in Neural Text-to-Speech
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
18
0
0
07 Oct 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for
  Neural Codec Language Models
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
53
1
0
28 Sep 2024
Speaking from Coarse to Fine: Improving Neural Codec Language Model via
  Multi-Scale Speech Coding and Generation
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
Haohan Guo
Fenglong Xie
Dongchao Yang
Xixin Wu
Helen Meng
43
1
0
18 Sep 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with
  Unified Multilingual Codec Language Modelling
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Yixuan Zhou
Xiaoyu Qin
Zeyu Jin
Shuoyi Zhou
Shun Lei
Songtao Zhou
Zhiyong Wu
Jia Jia
AuLLM
30
5
0
28 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
35
48
0
26 Jun 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via
  Monotonic Alignment
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
41
14
0
12 Jun 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
40
1
0
30 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
  for Text-to-Speech Synthesis
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
29
23
0
04 Apr 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
74
57
0
25 Mar 2024
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
191
337
0
01 Feb 2021
1