ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.03751
  4. Cited By
Recent Advances in Speech Language Models: A Survey
v1v2v3 (latest)

Recent Advances in Speech Language Models: A Survey

1 October 2024
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "Recent Advances in Speech Language Models: A Survey"

50 / 139 papers shown
Title
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
Tuan Le Duc Anh
Shreyas Gopal
Yue Heng Yeo
Warren Keng Hoong Low
Eng Siong Chng
J. Yip
SyDa
75
1
0
23 May 2025
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Chengwei Wei
Bin Wang
Jung-jae Kim
Nancy F. Chen
AuLLMReLMLRM
52
0
0
21 May 2025
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
Guangke Chen
Fu Song
Zhe Zhao
Xiaojun Jia
Yang Liu
Yanchen Qiao
Weizhe Zhang
AuLLMAAML
80
1
0
20 May 2025
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
Debarpan Bhattacharya
Apoorva Kulkarni
Sriram Ganapathy
60
0
0
19 May 2025
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Qingkai Fang
Yan Zhou
Shoutao Guo
Shaolei Zhang
Yang Feng
AuLLM
91
4
0
05 May 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
Keqi Deng
Wenxi Chen
Xie Chen
P. Woodland
116
0
0
22 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
139
14
0
11 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
116
1
0
03 Apr 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
André F. T. Martins
101
2
0
13 Mar 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
81
3
0
19 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
156
4
0
28 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jing Zhang
Lu Lu
Yansen Wang
Haizhou Li
Zhikai Wu
AuLLM
159
25
0
17 Jan 2025
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Ze Yuan
Yanqing Liu
Shujie Liu
Sheng Zhao
AuLLM
134
2
0
06 Dec 2024
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and
  Generation
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
Jing Zhang
Guangzhi Sun
Lu Lu
Yansen Wang
Chao Zhang
AuLLM
129
11
0
27 Nov 2024
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Chien-yu Huang
Wei-Chih Chen
Shu-Wen Yang
Andy T. Liu
Chen-An Li
...
David Harwath
Shinji Watanabe
Hung-yi Lee
Shinji Watanabe
Hung-yi Lee
ELMAuLLM
69
29
0
08 Nov 2024
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
111
9
0
04 Nov 2024
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model
  with Frozen LLM
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Xiong Wang
Yangze Li
Chaoyou Fu
Yunhang Shen
Lei Xie
Ke Li
Xing Sun
Long Ma
AuLLMMLLM
132
40
0
01 Nov 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
226
1,038
0
25 Oct 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLMELM
119
46
0
24 Oct 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDaBDLAuLLMVLM
127
20
0
23 Oct 2024
VoiceBench: Benchmarking LLM-Based Voice Assistants
VoiceBench: Benchmarking LLM-Based Voice Assistants
Yiming Chen
Xianghu Yue
Chen Zhang
Xiaoxue Gao
R. Tan
Haoyang Li
ELMAuLLM
111
29
0
22 Oct 2024
What Do Speech Foundation Models Not Learn About Speech?
What Do Speech Foundation Models Not Learn About Speech?
Abdul Waheed
Hanin Atwany
Bhiksha Raj
Rita Singh
SSL
93
4
0
16 Oct 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice
  Interaction Abilities
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Xin Zhang
Xiang Lyu
Zhihao Du
Qian Chen
Dong Zhang
...
Yuxuan Wang
Bin Zhang
Heng Lu
Yaqian Zhou
Xipeng Qiu
AuLLM
95
9
0
09 Oct 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLMMLLMVLM
168
29
0
26 Sep 2024
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue
  Agents
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
Bandhav Veluri
Benjamin Peloquin
Bokai Yu
Hongyu Gong
Shyamnath Gollakota
AuLLMOffRL
97
19
0
23 Sep 2024
Moshi: a speech-text foundation model for real-time dialogue
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Hervé Jégou
Edouard Grave
Neil Zeghidour
AuLLM
147
150
0
17 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon
Amit Roth
Yossi Adi
ELMAuLLM
124
7
0
11 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
99
54
0
10 Sep 2024
Investigating Neural Audio Codecs for Speech Language Model-Based Speech
  Generation
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Jiaqi Li
Dongmei Wang
Xiaofei Wang
Yao Qian
Long Zhou
...
Junkun Chen
Sheng Zhao
Jinyu Li
Zhizheng Wu
Michael Zeng
AuLLM
76
3
0
06 Sep 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Zhifei Xie
Changqiao Wu
AuLLMVGenVLMSyDaLRM
79
73
0
29 Aug 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Yunhang Shen
Meng Zhao
...
Rongrong Ji
Xing Sun
Ran He
Caifeng Shan
Xing Sun
MLLM
105
95
0
09 Aug 2024
Language Model Can Listen While Speaking
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Yansen Wang
Xie Chen
AuLLM
95
28
0
05 Aug 2024
Qwen2 Technical Report
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLMVLMMU
196
981
0
15 Jul 2024
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer
  based on Supervised Semantic Tokens
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Zhihao Du
Qian Chen
Shiliang Zhang
Kai Hu
Heng Lu
...
Siqi Zheng
Yue Gu
Ziyang Ma
Zhifu Gao
Zhijie Yan
DiffM
76
143
0
07 Jul 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for
  Natural Interaction Between Humans and LLMs
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Keyu An
Qian Chen
Chong Deng
Zhihao Du
Changfeng Gao
...
Bin Zhang
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Siqi Zheng
AuLLM
125
57
0
04 Jul 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Siyang Song
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLMELMLM&MA
145
35
0
23 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
132
643
0
18 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
Jingshu Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Yanzhe Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
100
106
0
04 Jun 2024
Generative Pre-trained Speech Language Model with Efficient Hierarchical
  Transformer
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Yongxin Zhu
Dan Su
Liqiang He
Linli Xu
Dong Yu
64
7
0
03 Jun 2024
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in
  Zero and Few-shot Learning
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
Keqi Deng
Guangzhi Sun
Phil Woodland
VLM
54
4
0
01 Jun 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
75
43
0
14 May 2024
SpeechAlign: Aligning Speech Generation to Human Preferences
SpeechAlign: Aligning Speech Generation to Human Preferences
Dong Zhang
Zhaowei Li
Shimin Li
Xin Zhang
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
ALMAuLLM
87
18
0
08 Apr 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
122
79
0
25 Mar 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative
  Comprehension
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MAAuLLMALM
87
84
0
12 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLMVLM
92
53
0
08 Feb 2024
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
Dong Zhang
Xin Zhang
Jun Zhan
Shimin Li
Yaqian Zhou
Xipeng Qiu
AuLLMBDL
81
17
0
24 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
164
1,123
0
08 Jan 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
150
2,786
0
01 Dec 2023
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
Yassir Fathullah
Chunyang Wu
Egor Lakomkin
Ke Li
Junteng Jia
Shangguan Yuan
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
LM&MAMLLMAuLLM
91
42
0
12 Nov 2023
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Jing Pan
Jian Wu
Yashesh Gaur
S. Sivasankaran
Zhuo Chen
Shujie Liu
Jinyu Li
ELM
73
29
0
03 Nov 2023
123
Next