ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.15687
  4. Cited By
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

23 June 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
Rashel Moritz
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
    AuLLM
ArXivPDFHTML

Papers citing "Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale"

50 / 205 papers shown
Title
Enhancing Anti-spoofing Countermeasures Robustness through Joint
  Optimization and Transfer Learning
Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning
Yikang Wang
Xingming Wang
Hiromitsu Nishizaki
Ming Li
AAML
37
0
0
29 Jul 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
34
0
0
24 Jul 2024
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech
  SpeechT5 Model
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
Jan Lehecka
Z. Hanzlícek
J. Matousek
Daniel Tihelka
40
0
0
24 Jul 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of
  Flow-Matching-Based Zero-Shot Text-to-Speech
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Haibin Wu
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Daniel Tompkins
...
Canrun Li
Zhen Xiao
Sheng Zhao
Jinyu Li
Naoyuki Kanda
28
7
0
17 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
54
33
0
11 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for
  Large-Scale Speech Generation
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
41
38
0
07 Jul 2024
Diffusion Models and Representation Learning: A Survey
Diffusion Models and Representation Learning: A Survey
Michael Fuest
Pingchuan Ma
Ming Gui
Johannes S. Fischer
Vincent Tao Hu
Bjorn Ommer
DiffM
49
20
0
30 Jun 2024
Natural Language but Omitted? On the Ineffectiveness of Large Language
  Models' privacy policy from End-users' Perspective
Natural Language but Omitted? On the Ineffectiveness of Large Language Models' privacy policy from End-users' Perspective
Shuning Zhang
Haobin Xing
Xin Yi
Hewu Li
PILM
53
0
0
26 Jun 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
35
52
0
26 Jun 2024
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual
  Text-to-Speech Adaptation
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Yingting Li
Ambuj Mehrish
Bryan Chew
Bo Cheng
Soujanya Poria
45
0
0
25 Jun 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
58
0
0
24 Jun 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient
  Zero-Shot Text to Speech Synthesizers
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Guanrou Yang
Xie Chen
AuLLM
46
3
0
22 Jun 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible
  Acoustic Reception and Reaction
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Haoqiu Yan
Yongxin Zhu
Kai Zheng
Bing Liu
Haoyu Cao
Deqiang Jiang
Linli Xu
AuLLM
44
4
0
18 Jun 2024
Joint Audio and Symbolic Conditioning for Temporally Controlled
  Text-to-Music Generation
Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation
Or Tal
Alon Ziv
Itai Gat
Felix Kreuk
Yossi Adi
58
13
0
16 Jun 2024
Benchmarking Children's ASR with Supervised and Self-supervised Speech
  Foundation Models
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
Ruchao Fan
Natarajan Balaji Shankar
Abeer Alwan
43
7
0
15 Jun 2024
Diffusion Synthesizer for Efficient Multilingual Speech to Speech
  Translation
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
Nameer Hirschkind
Xiao Yu
Mahesh Kumar Nandwana
Joseph Liu
Eloi DuBois
...
Colin Sinclair
Kyle Spence
Charles Shang
Zoë Abrams
Morgan McGuire
42
0
0
14 Jun 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
Kaidi Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
46
3
0
12 Jun 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via
  Monotonic Alignment
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
49
15
0
12 Jun 2024
Learning Fine-Grained Controllability on Speech Generation via Efficient
  Fine-Tuning
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Chung-Ming Chien
Andros Tjandra
Apoorv Vyas
Matt Le
Bowen Shi
Wei-Ning Hsu
34
0
0
10 Jun 2024
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot
  TTS
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
Zirun Zhu
...
Yufei Xia
Jinzhu Li
Sheng Zhao
Jinyu Li
Naoyuki Kanda
48
3
0
09 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
50
15
0
08 Jun 2024
Should you use a probabilistic duration model in TTS? Probably!
  Especially for spontaneous speech
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
Shivam Mehta
Harm Lameris
Rajiv Punmiya
Jonas Beskow
Éva Székely
G. Henter
38
1
0
08 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text
  to Speech Synthesizers
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
47
67
0
08 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
46
69
0
07 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech
  Synthesis
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
42
6
0
06 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for
  Noise-Robust Expressive Speech-to-Speech Translation
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
35
2
0
04 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
Jingshu Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Wenjie Qu
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
56
83
0
04 Jun 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar
  Latent Transformer Diffusion Models
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen M. Meng
67
26
0
04 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and
  Zero-shot Language Style Control With Decoupled Codec
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Siqi Zheng
Qian Chen
...
Ziyue Jiang
Hai Huang
Xize Cheng
Rongjie Huang
Zhou Zhao
60
8
0
03 Jun 2024
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Yongqi Wang
Wenxiang Guo
Rongjie Huang
Jia-Bin Huang
Zehan Wang
Fuming You
Ruiqi Li
Zhou Zhao
VGen
DiffM
41
12
0
01 Jun 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony
  Preservation
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
Chenyang Le
Yao Qian
Dongmei Wang
Long Zhou
Shujie Liu
...
Midia Yousefi
Yanmin Qian
Jinyu Li
Sheng Zhao
Michael Zeng
49
3
0
28 May 2024
Deep MMD Gradient Flow without adversarial training
Deep MMD Gradient Flow without adversarial training
Alexandre Galashov
Valentin De Bortoli
Arthur Gretton
DiffM
45
8
0
10 May 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
51
4
0
30 Apr 2024
A Survey on Diffusion Models for Time Series and Spatio-Temporal Data
A Survey on Diffusion Models for Time Series and Spatio-Temporal Data
Yiyuan Yang
Ming Jin
Haomin Wen
Chaoli Zhang
Keli Zhang
...
Bin Yang
Zenglin Xu
Jiang Bian
Shirui Pan
Qingsong Wen
DiffM
AI4TS
SyDa
42
41
0
29 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
38
16
0
23 Apr 2024
Less Peaky and More Accurate CTC Forced Alignment by Label Priors
Less Peaky and More Accurate CTC Forced Alignment by Label Priors
Ruizhe Huang
Xiaohui Zhang
Zhaoheng Ni
Li Sun
Moto Hira
...
Vineel Pratap
Sanjeev Khudanpur
Shinji Watanabe
Daniel Povey
Sanjeev Khudanpur
37
4
0
22 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
58
1
0
16 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
36
1
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
47
4
0
10 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
  for Text-to-Speech Synthesis
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
37
23
0
04 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
74
42
0
03 Apr 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
74
64
0
25 Mar 2024
A Roadmap Towards Automated and Regulated Robotic Systems
A Roadmap Towards Automated and Regulated Robotic Systems
Yihao Liu
Mehran Armand
55
2
0
21 Mar 2024
MSLM-S2ST: A Multitask Speech Language Model for Textless
  Speech-to-Speech Translation with Speaker Style Preservation
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
38
4
0
19 Mar 2024
An Empirical Study of Speech Language Models for Prompt-Conditioned
  Speech Synthesis
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
38
1
0
19 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
49
147
0
05 Mar 2024
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech
  Synthesis
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Wei-wei Lin
Chenhang He
Man-Wai Mak
Jiachen Lian
Kong Aik Lee
DiffM
49
0
0
01 Mar 2024
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Gaoxiang Cong
Yuankai Qi
Liang-Sheng Li
Amin Beheshti
Zhedong Zhang
Anton Van Den Hengel
Ming-Hsuan Yang
Chenggang Yan
Qingming Huang
51
12
0
20 Feb 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot
  Text-to-Speech
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Shengpeng Ji
Ziyue Jiang
Hanting Wang
Jia-li Zuo
Zhou Zhao
40
10
0
14 Feb 2024
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Naoyuki Kanda
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
...
Yufei Xia
Jinzhu Li
Yanqing Liu
Sheng Zhao
Michael Zeng
35
8
0
12 Feb 2024
Previous
12345
Next