ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Communities
  3. ...

Neighbor communities

0 / 0 papers shown
Title
Top Contributors
Name# Papers# Citations
Social Events
DateLocationEvent
  1. Home
  2. Communities
  3. AuLLM

Audio Large Language Models

AuLLM
More data

Exploring the development and application of large language models specifically tailored for audio data processing and understanding.

Neighbor communities

51015

Featured Papers

0 / 0 papers shown
Title

All papers

50 / 535 papers shown
Title
SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level
SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level
Hitomi Jin Ling Tee
Chaoren Wang
Zijie Zhang
Zhizheng Wu
AuLLMELM
51
0
0
30 Oct 2025
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
Pedro Corrêa
João Lima
Victor Moreno
Lucas Ueda
Paula Dornhofer Paro Costa
AuLLM
108
0
0
29 Oct 2025
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
Kang Zhang
T. Pham
Suyeon Lee
Axi Niu
Arda Senocak
Joon Son Chung
AuLLMVGen
0
0
0
28 Oct 2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Zihan Liu
Zhikang Niu
Qiuyang Xiao
Zhisheng Zheng
Ruoqi Yuan
...
Jianze Liang
Xie Chen
Leilei Sun
Dahua Lin
Jiaqi Wang
AuLLMLRM
52
0
0
28 Oct 2025
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
Bohan Li
Wenbin Huang
Yuhang Qiu
Yiwei Guo
Hankun Wang
Zhihan Li
Jing Peng
Ziyang Ma
Xie Chen
Kai Yu
AuLLM
20
0
0
27 Oct 2025
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
Li Zhou
Lutong Yu
You Lyu
Yihang Lin
Zefeng Zhao
Junyi Ao
Yuhao Zhang
Benyou Wang
Haizhou Li
AuLLM
12
0
0
26 Oct 2025
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
Michael Ungersböck
Florian Grötschla
Luca A. Lanzendörfer
June Young Yi
Changho Choi
Roger Wattenhofer
AuLLM
12
0
0
26 Oct 2025
Are These Even Words? Quantifying the Gibberishness of Generative Speech Models
Are These Even Words? Quantifying the Gibberishness of Generative Speech Models
Danilo de Oliveira
Tal Peer
Jonas Rochdi
Timo Gerkmann
AuLLM
12
0
0
24 Oct 2025
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
Yejin Kwon
Taewoo Kang
Hyunsoo Yoon
Changouk Kim
AuLLMELMLRM
44
0
0
22 Oct 2025
Which Evaluation for Which Model? A Taxonomy for Speech Model Assessment
Which Evaluation for Which Model? A Taxonomy for Speech Model Assessment
Maureen de Seyssel
Eeshan Gunesh Dhekane
AuLLMELM
12
0
0
22 Oct 2025
The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS
The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS
Brandon James Carone
Iran R. Roman
Pablo Ripollés
AuLLMLRM
8
1
0
21 Oct 2025
Can large audio language models understand child stuttering speech? speech summarization, and source separation
Can large audio language models understand child stuttering speech? speech summarization, and source separation
Chibuzor Okocha
Maya Bakri
Christan Grant
AuLLM
32
0
0
21 Oct 2025
End-to-end Listen, Look, Speak and Act
End-to-end Listen, Look, Speak and Act
Siyin Wang
Wenyi Yu
Xianzhao Chen
Xiaohai Tian
Jun Zhang
Lu Lu
C. Zhang
AuLLM
46
0
0
19 Oct 2025
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
Chih-Kai Yang
Yen-Ting Piao
Tzu-wen Hsu
Szu-Wei Fu
Zhehuai Chen
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
KELMAuLLM
29
0
0
19 Oct 2025
VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency
VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency
Hongcheng Liu
Yixuan Hou
Heyang Liu
Yuhao Wang
Yanfeng Wang
Y Samuel Wang
AuLLM
24
0
0
17 Oct 2025
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Yuatyong Chaichana
Pittawat Taveekitworachai
Warit Sirichotedumrong
Potsawee Manakul
Kunat Pipatanakul
AuLLM
28
0
0
17 Oct 2025
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
Xiaohan Zhao
Hongyu Xiang
Shengze Ye
Song Li
Zhengkun Tian
Guanyu Chen
Ke Ding
Guanglu Wan
AuLLM
20
0
0
17 Oct 2025
AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
Yueqian Lin
Zhengmian Hu
Jayakumar Subramanian
Qinsi Wang
N. Vlassis
Hai Helen Li
Yiran Chen
LLMAGAuLLMLRM
61
1
0
17 Oct 2025
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
Hui Wang
J. Zhao
Yifan Yang
Shujie Liu
Junyang Chen
...
Jinyu Li
Jiaming Zhou
Haoqin Sun
Yan Lu
Yong Qin
AuLLMELM
42
0
0
16 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
68
0
0
15 Oct 2025
Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module
Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module
Ruitao Feng
Bixi Zhang
Sheng Liang
Zheng Yuan
AuLLMMoELLMSV
25
0
0
15 Oct 2025
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Zhenyu Liu
Yunxin Li
Xuanyu Zhang
Qixun Teng
Shenyuan Jiang
...
Mingjun Zhao
Yu-Syuan Xu
Yancheng He
Baotian Hu
Min Zhang
AuLLMMoE
62
0
0
15 Oct 2025
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Jinchuan Tian
Sang-gil Lee
Zhifeng Kong
Sreyan Ghosh
Arushi Goel
...
Shinji Watanabe
Mohammad Shoeybi
Bryan Catanzaro
Rafael Valle
Wei Ping
AuLLMLRM
81
0
0
13 Oct 2025
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
Kuan-Yi Lee
Tsung-En Lin
Hung-yi Lee
AuLLMLRM
27
0
0
13 Oct 2025
Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance
Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance
Jingyi Chen
Zhimeng Guo
Jiyun Chun
Pichao Wang
Andrew Perrault
Micha Elsner
AuLLM
12
0
0
12 Oct 2025
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Caorui Li
Yu Chen
Yiyan Ji
Jin Xu
Zhenyu Cui
...
Zili Wang
Minghao Liu
Junran Peng
Zhaoxiang Zhang
Jiaheng Liu
AuLLMLRM
16
2
0
12 Oct 2025
End-to-end Automatic Speech Recognition and Speech Translation: Integration of Speech Foundational Models and LLMs
End-to-end Automatic Speech Recognition and Speech Translation: Integration of Speech Foundational Models and LLMs
Nam Luu
Ondřej Bojar
AuLLM
46
0
0
11 Oct 2025
The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach
The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach
Nizar El Ghazal
Antoine Caubrière
Valentin Vielzeuf
AuLLM
43
0
0
10 Oct 2025
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
Hanke Xie
Dake Guo
C. Wang
Yue Li
WenJie Tian
...
Xinsheng Wang
Xiulin Li
Guanqiong Miao
B. Liu
Lei Xie
AuLLM
81
0
0
09 Oct 2025
VoiceAgentBench: Are Voice Assistants ready for agentic tasks?
VoiceAgentBench: Are Voice Assistants ready for agentic tasks?
Dhruv Jain
Harshit Shukla
Gautam Rajeev
Ashish Kulkarni
Chandra Khatri
Shubham Agarwal
AuLLMELM
61
0
0
09 Oct 2025
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Peize He
Zichen Wen
Yubo Wang
Y. Wang
Xiaoqian Liu
...
Zhifei Liu
Weijia Li
C. Wang
Conghui He
Linfeng Zhang
AuLLM
77
0
0
08 Oct 2025
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
Krish Patel
Dingkun Zhou
Ajay Kankipati
Akshaj Gupta
Zeyi Austin Li
...
Guan-Ting Lin
Kan Jen Cheng
Huang-Cheng Chou
Jiachen Lian
Gopala Anumanchipalli
AuLLM
8
1
0
08 Oct 2025
Robustness assessment of large audio language models in multiple-choice evaluation
Robustness assessment of large audio language models in multiple-choice evaluation
F. López
Santosh Kesiraju
Jordi Luque
AuLLMELM
64
0
0
06 Oct 2025
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Wenhao Guan
Zhikang Niu
Ziyue Jiang
Kaidi Wang
Peijie Chen
Q. Hong
Lin Li
Xie Chen
AuLLM
89
0
0
06 Oct 2025
AudioToolAgent: An Agentic Framework for Audio-Language Models
AudioToolAgent: An Agentic Framework for Audio-Language Models
Gijs Wijngaard
Elia Formisano
M. Dumontier
LLMAGAuLLM
43
0
0
03 Oct 2025
Transcribe, Translate, or Transliterate: An Investigation of Intermediate Representations in Spoken Language Models
Transcribe, Translate, or Transliterate: An Investigation of Intermediate Representations in Spoken Language Models
Tolúl\d{o}pé Ògúnrèmí
Christopher D. Manning
Dan Jurafsky
Karen Livescu
AuLLM
74
0
0
02 Oct 2025
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models
Donghang Wu
H. Zhang
Chen Chen
Tianyu Zhang
Fei Tian
...
Gang Yu
Hexin Liu
Nana Hou
Yuchen Hu
Eng Siong Chng
AuLLMKELMAI4CELRM
139
0
0
02 Oct 2025
Backdoor Attacks Against Speech Language Models
Backdoor Attacks Against Speech Language Models
Alexandrine Fortier
Thomas Thebaud
Jesus Villalba
Najim Dehak
P. Cardinal
AuLLM
48
0
0
01 Oct 2025
PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
Yujia Xiao
Liumeng Xue
Lei He
Xinyi Chen
Aemon Yat Fei Chiu
...
Shaofei Zhang
Qiuqiang Kong
Xinfa Zhu
Wei Xue
Tan Lee
AuLLMVGen
16
0
0
01 Oct 2025
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
Chen-An Li
Tzu-Han Lin
Hung-yi Lee
AuLLM
20
0
0
01 Oct 2025
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Xingjian Zhao
Zhe Xu
Qinyuan Cheng
Zhaoye Fei
Luozhijie Jin
...
Yitian Gong
Yuanfan Xu
Yaqian Zhou
Xuanjing Huang
Xipeng Qiu
AuLLM
93
0
0
01 Oct 2025
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
Kai-Wei Chang
En-Pei Hu
Chun-Yi Kuan
Wenze Ren
Wei-Chih Chen
Guan-Ting Lin
Yu Tsao
Shao-Hua Sun
Hung-yi Lee
James R. Glass
AuLLM
43
2
0
30 Sep 2025
Optimizing Speech Language Models for Acoustic Consistency
Optimizing Speech Language Models for Acoustic Consistency
Morteza Rohanian
Michael Krauthammer
AuLLM
16
0
0
30 Sep 2025
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
Yi-Cheng Lin
Yu-Hua Chen
Jia-Kai Dong
Yueh-Hsuan Huang
Szu-Chi Chen
...
I-Ning Tsai
H. Wang
Ho-Lam Chung
Ke-Han Lu
Hung-yi Lee
AuLLMVLM
11
0
0
30 Sep 2025
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Chengyao Wang
Zhisheng Zhong
Bohao Peng
Senqiao Yang
Yuqi Liu
Haokun Gui
Bin Xia
Jingyao Li
Bei Yu
Jiaya Jia
MLLMAuLLMVLM
46
1
0
29 Sep 2025
VoiceBridge: Designing Latent Bridge Models for General Speech Restoration at Scale
VoiceBridge: Designing Latent Bridge Models for General Speech Restoration at Scale
Chi Zhang
Zehua Chen
Kaiwen Zheng
Jun Zhu
AuLLM
60
0
0
28 Sep 2025
AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models
AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models
Wenyu Li
Xiaoqi Jiao
Yi Chang
Guangyan Zhang
Yiwen Guo
AuLLM
22
0
0
27 Sep 2025
Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Zhichao Sheng
Shilin Zhou
Chen Gong
Zhenghua Li
AuLLMLRM
31
0
0
26 Sep 2025
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
Zhen Xiong
Yujun Cai
Zhecheng Li
Junsong Yuan
Yiwei Wang
AuLLMLRM
95
0
0
26 Sep 2025
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
Ke Wang
Houxing Ren
Zimu Lu
Mingjie Zhan
Hongsheng Li
AuLLMELM
5
0
0
26 Sep 2025
Loading #Papers per Month with "AuLLM"
Past speakers
Name (-)
Top Contributors
Name (-)
Top Organizations at ResearchTrend.AI
Name (-)
Social Events
DateLocationEvent
No social events available