ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.15687
  4. Cited By
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

23 June 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
Rashel Moritz
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
    AuLLM
ArXivPDFHTML

Papers citing "Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale"

50 / 205 papers shown
Title
From Graph Diffusion to Graph Classification
From Graph Diffusion to Graph Classification
Jia Jun Cheng Xian
Sadegh Mahdavi
Renjie Liao
Oliver Schulte
GNN
DiffM
82
0
0
26 Nov 2024
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
Joseph Liu
Joshua Geddes
Ziyu Guo
Haomiao Jiang
Mahesh Kumar Nandwana
66
0
0
15 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
53
5
0
04 Nov 2024
Augmenting Polish Automatic Speech Recognition System With Synthetic
  Data
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
Łukasz Bondaruk
Jakub Kubiak
Mateusz Czyżnikiewicz
42
0
0
30 Oct 2024
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient
  Learner for text-to-speech synthesis
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
Axel Roebel
42
1
0
30 Oct 2024
A Closer Look at Neural Codec Resynthesis: Bridging the Gap between
  Codec and Waveform Generation
A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation
Alexander H. Liu
Qirui Wang
Yuan Gong
James Glass
40
0
0
29 Oct 2024
Asynchronous Tool Usage for Real-Time Agents
Asynchronous Tool Usage for Real-Time Agents
Antonio A. Ginart
Naveen Kodali
Jason D. Lee
Caiming Xiong
Silvio Savarese
John Emmons
LLMAG
SyDa
35
0
0
28 Oct 2024
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
K R Prajwal
Bowen Shi
Matthew Lee
Apoorv Vyas
Andros Tjandra
...
Baishan Guo
Huiyu Wang
Triantafyllos Afouras
David Kant
Wei-Ning Hsu
43
5
0
27 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Guanrou Yang
Fan Yu
Zejun Ma
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
34
2
0
22 Oct 2024
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio
  Generation
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Huadai Liu
Jialei Wang
Rongjie Huang
Yang Liu
H. Lu
Wei Xue
Zhou Zhao
13
3
0
16 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
63
2
0
16 Oct 2024
Hessian-Informed Flow Matching
Hessian-Informed Flow Matching
Christopher Iliffe Sprague
Arne Elofsson
Hossein Azizpour
37
0
0
15 Oct 2024
MimicTalk: Mimicking a personalized and expressive 3D talking face in
  minutes
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Zhenhui Ye
Tianyun Zhong
Yi Ren
Ziyue Jiang
Jiawei Huang
...
Chen Zhang
Zehan Wang
Xize Chen
Xiang Yin
Zhou Zhao
VGen
41
3
0
09 Oct 2024
Can DeepFake Speech be Reliably Detected?
Can DeepFake Speech be Reliably Detected?
Hongbin Liu
Youzheng Chen
Arun Narayanan
Athula Balachandran
Pedro J. Moreno
Lun Wang
AAML
40
1
0
09 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
40
55
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
39
0
0
09 Oct 2024
CTC-GMM: CTC guided modality matching for fast and accurate streaming
  speech translation
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Rui Zhao
Jinyu Li
Ruchao Fan
Matt Post
41
1
0
07 Oct 2024
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long
  Zero-Shot Text-to-Speech Synthesis
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
Yuto Nishimura
Takumi Hirose
Masanari Ohi
Hideki Nakayama
Nakamasa Inoue
VLM
42
1
0
06 Oct 2024
Graded Suspiciousness of Adversarial Texts to Human
Graded Suspiciousness of Adversarial Texts to Human
Shakila Mahjabin Tonni
Pedro Faustini
Mark Dras
AAML
35
0
0
06 Oct 2024
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
Taejun Bak
Youngsik Eom
SeungJae Choi
Young-Sun Joo
43
0
0
04 Oct 2024
Zero-Shot Text-to-Speech from Continuous Text Streams
Zero-Shot Text-to-Speech from Continuous Text Streams
Trung D. Q. Dang
David Aponte
Dung Tran
Tianyi Chen
K. Koishida
AuLLM
VLM
42
3
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
17
0
01 Oct 2024
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
N. Pia
Martin Strauss
M. Multrus
B. Edler
44
0
0
26 Sep 2024
Evaluation of state-of-the-art ASR Models in Child-Adult Interactions
Evaluation of state-of-the-art ASR Models in Child-Adult Interactions
Aditya Ashvin
Rimita Lahiri
Aditya Kommineni
Somer Bishop
C. Lord
Sudarsana Reddy Kadiri
Shrikanth Narayanan
29
0
0
24 Sep 2024
Generative Speech Foundation Model Pretraining for High-Quality Speech
  Extraction and Restoration
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Pin-Jui Ku
Alexander H. Liu
Roman Korostik
Sung-Feng Huang
Szu-Wei Fu
Ante Jukić
44
2
0
24 Sep 2024
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple
  Speakers
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
Nohil Park
Heeseung Kim
Che Hyun Lee
Jooyoung Choi
Jiheum Yeom
Sungroh Yoon
31
2
0
24 Sep 2024
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient
  Speaker-Adaptive Text-to-Speech via Autoguidance
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance
Jiheum Yeom
Heeseung Kim
Jooyoung Choi
Che Hyun Lee
Nohil Park
Sungroh Yoon
37
1
0
24 Sep 2024
Speechworthy Instruction-tuned Language Models
Speechworthy Instruction-tuned Language Models
Hyundong Justin Cho
Nicolaas Jedema
Leonardo F. R. Ribeiro
Karishma Sharma
Pedro Szekely
Alessandro Moschitti
Ruben Janssen
Jonathan May
ALM
47
1
0
23 Sep 2024
Sketching With Your Voice: "Non-Phonorealistic" Rendering of Sounds via
  Vocal Imitation
Sketching With Your Voice: "Non-Phonorealistic" Rendering of Sounds via Vocal Imitation
Matthew Caren
Kartik Chandra
J. Tenenbaum
Jonathan Ragan-Kelley
Karima Ma
43
0
0
20 Sep 2024
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and
  Acoustic Consistency
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
Yang Chen
Yuhang Jia
Shiwan Zhao
Ziyue Jiang
Haoran Li
Jiarong Kang
Yong Qin
20
1
0
19 Sep 2024
SpMis: An Investigation of Synthetic Spoken Misinformation Detection
SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Peizhuo Liu
Li Wang
Renqiang He
Haorui He
Lei Wang
Huadi Zheng
Jie Shi
Tong Xiao
Zhizheng Wu
37
1
0
17 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
31
5
0
16 Sep 2024
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for
  Full-band Speech Restoration with Improved Intelligibility
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility
Xiaoyu Liu
Xu Li
Joan Serrà
Santiago Pascual
36
3
0
14 Sep 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS
E1 TTS: Simple and Fast Non-Autoregressive TTS
Zhijun Liu
Shuai Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
VLM
DiffM
40
3
0
14 Sep 2024
Seed-Music: A Unified Framework for High Quality and Controlled Music
  Generation
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation
Ye Bai
Haonan Chen
Jitong Chen
Zhuo Chen
Yi Deng
...
Hang Zhao
Ziyi Zhao
Dejian Zhong
Shicen Zhou
Pei Zou
DiffM
63
6
0
13 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
45
2
0
13 Sep 2024
Text-To-Speech Synthesis In The Wild
Text-To-Speech Synthesis In The Wild
Jee-weon Jung
Wangyou Zhang
Soumi Maiti
Yihan Wu
Xin Wang
...
Hye-jin Shim
Nicholas W. D. Evans
Joon Son Chung
Shinnosuke Takamichi
Shinji Watanabe
46
1
0
13 Sep 2024
SongCreator: Lyrics-based Universal Song Generation
SongCreator: Lyrics-based Universal Song Generation
Shun Lei
Yixuan Zhou
Boshi Tang
Max W. Y. Lam
Feng Liu
Hangyu Liu
Jingcheng Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
57
5
0
09 Sep 2024
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow
  Matching
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Zhengyang Chen
Bing Han
Shuai Wang
Yidi Jiang
Yanmin Qian
53
0
0
07 Sep 2024
Investigating Neural Audio Codecs for Speech Language Model-Based Speech
  Generation
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Jiaqi Li
Dongmei Wang
Xiaofei Wang
Yao Qian
Long Zhou
...
Junkun Chen
Sheng Zhao
Jinyu Li
Zhizheng Wu
Michael Zeng
AuLLM
43
3
0
06 Sep 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace
Soham Ray
Kwangyoun Kim
Kilian Q. Weinberger
Felix Wu
36
2
0
01 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
45
43
0
01 Sep 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
245
1
0
30 Aug 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Zhifei Xie
Changqiao Wu
AuLLM
VGen
VLM
SyDa
LRM
37
60
0
29 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
37
1
0
29 Aug 2024
Promises and challenges of generative artificial intelligence for human
  learning
Promises and challenges of generative artificial intelligence for human learning
Lixiang Yan
Samuel Greiff
Ziwen Teuber
Dragan Gašević
54
55
0
22 Aug 2024
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion
  of Whispered and Regular Speech
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
35
0
0
21 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OOD
DiffM
AI4TS
55
5
0
14 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for
  Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
44
0
0
11 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
46
1
0
01 Aug 2024
Previous
12345
Next