ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.04215
  4. Cited By
A Vector Quantized Approach for Text to Speech Synthesis on Real-World
  Spontaneous Speech

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech

8 February 2023
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
ArXivPDFHTML

Papers citing "A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech"

27 / 27 papers shown
Title
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual
  Text-to-Speech Synthesis
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Yunhong Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
35
10
0
02 Nov 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
65
1
0
18 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
49
0
0
16 Sep 2024
Emo-DPO: Controllable Emotional Speech Synthesis through Direct
  Preference Optimization
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Xiaoxue Gao
Chen Zhang
Yiming Chen
Huayun Zhang
Nancy F. Chen
43
6
0
16 Sep 2024
Text-To-Speech Synthesis In The Wild
Text-To-Speech Synthesis In The Wild
Jee-weon Jung
Wangyou Zhang
Soumi Maiti
Yihan Wu
Xin Wang
...
Hye-jin Shim
Nicholas W. D. Evans
Joon Son Chung
Shinnosuke Takamichi
Shinji Watanabe
38
1
0
13 Sep 2024
Generative Expressive Conversational Speech Synthesis
Generative Expressive Conversational Speech Synthesis
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
56
5
0
31 Jul 2024
TTSDS -- Text-to-Speech Distribution Score
TTSDS -- Text-to-Speech Distribution Score
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
26
0
0
17 Jul 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
  Modeling of Audio Discrete Codes
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
38
3
0
05 Jun 2024
M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual
  Academic Lecture Dataset
M3^33AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Zhe Chen
Heyang Liu
Wenyi Yu
Guangzhi Sun
Hongcheng Liu
Ji Wu
Chao Zhang
Yu Wang
Yanfeng Wang
VGen
49
1
0
21 Mar 2024
Extending Multilingual Speech Synthesis to 100+ Languages without
  Transcribed Data
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
38
13
0
29 Feb 2024
PAM: Prompting Audio-Language Models for Audio Quality Assessment
PAM: Prompting Audio-Language Models for Audio Quality Assessment
Soham Deshmukh
Dareen Alharthi
Benjamin Elizalde
Hannes Gamper
Mahmoud Al Ismail
Rita Singh
Bhiksha Raj
Huaming Wang
29
11
0
01 Feb 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
  Zero-Shot Voice Conversion
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
22
6
0
19 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
24
7
0
05 Jan 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
24
21
0
22 Dec 2023
Expand-and-Quantize: Unsupervised Semantic Segmentation Using
  High-Dimensional Space and Product Quantization
Expand-and-Quantize: Unsupervised Semantic Segmentation Using High-Dimensional Space and Product Quantization
Jiyoung Kim
Kyuhong Shim
Insu Lee
B. Shim
19
2
0
12 Dec 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
19
6
0
26 Oct 2023
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Dareen Alharthi
Roshan S. Sharma
Hira Dhamyal
Soumi Maiti
Bhiksha Raj
Rita Singh
26
4
0
01 Oct 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
42
5
0
11 Jul 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code
  Collaborated with Mixer
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Yerin Choi
M. Koo
27
0
0
31 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio
  Codec
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
145
118
0
04 May 2023
A Comparative Study of Self-Supervised Speech Representations in Read
  and Spontaneous TTS
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
35
4
0
05 Mar 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
31
85
0
31 Jan 2023
Improving robustness of spontaneous speech synthesis with linguistic
  speech regularization and pseudo-filled-pause insertion
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
19
1
0
18 Oct 2022
ESPnet2-TTS: Extending the Edge of TTS Research
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
50
60
0
15 Oct 2021
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
253
695
0
27 Aug 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
188
337
0
01 Feb 2021
1