Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.17196
Cited By
v1
v2 (latest)
VoiceBench: Benchmarking LLM-Based Voice Assistants
22 October 2024
Yiming Chen
Xianghu Yue
Chen Zhang
Xiaoxue Gao
R. Tan
Haoyang Li
ELM
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VoiceBench: Benchmarking LLM-Based Voice Assistants"
18 / 18 papers shown
Title
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
Tuan Le Duc Anh
Shreyas Gopal
Yue Heng Yeo
Warren Keng Hoong Low
Eng Siong Chng
J. Yip
SyDa
67
1
0
23 May 2025
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning
Cheng Wen
Tingwei Guo
Shuaijiang Zhao
Wei Zou
Xiangang Li
OffRL
AuLLM
LRM
95
6
0
22 Apr 2025
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
Minghan Wang
Ye Bai
Yanjie Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
97
0
0
31 Mar 2025
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
Xiangyu Lu
Wang Xu
Haoyu Wang
Hongyun Zhou
Haiyan Zhao
Conghui Zhu
Tiejun Zhao
M. Yang
Mamba
AuLLM
92
0
0
16 Feb 2025
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
156
24
0
01 Oct 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
122
28
0
26 Sep 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Yunhang Shen
Meng Zhao
...
Rongrong Ji
Xing Sun
Ran He
Caifeng Shan
Xing Sun
MLLM
93
89
0
09 Aug 2024
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Zhihao Du
Qian Chen
Shiliang Zhang
Kai Hu
Heng Lu
...
Siqi Zheng
Yue Gu
Ziyang Ma
Zhifu Gao
Zhijie Yan
DiffM
74
136
0
07 Jul 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
108
627
0
25 Apr 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MA
AuLLM
ALM
80
80
0
12 Feb 2024
Automatic Pronunciation Assessment -- A Review
Yassine El Kheir
Ahmed M. Ali
Shammur A. Chowdhury
53
6
0
21 Oct 2023
A Chat About Boring Problems: Studying GPT-based text normalization
Yang Zhang
Travis M. Bartley
Mariana Graterol-Fuenmayor
Vitaly Lavrukhin
Evelina Bakhturina
Boris Ginsburg
22
6
0
23 Sep 2023
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Chien-yu Huang
Ke-Han Lu
Shi Wang
Chi-Yuan Hsiao
Chun-Yi Kuan
...
Roshan S. Sharma
Shinji Watanabe
Bhiksha Ramakrishnan
Shady Shehata
Hung-yi Lee
AuLLM
59
63
0
18 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
291
1,455
0
27 Jul 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,359
0
15 Mar 2023
End-to-End Speech Recognition and Disfluency Removal
Paria Jamshid Lou
Mark Johnson
54
33
0
22 Sep 2020
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
153
609
0
10 Mar 2020
Common Voice: A Massively-Multilingual Speech Corpus
Rosana Ardila
Megan Branson
Kelly Davis
Michael Henretty
M. Kohler
Josh Meyer
Reuben Morais
Lindsay Saunders
Francis M. Tyers
Gregor Weber
VLM
91
1,600
0
13 Dec 2019
1