Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.19168
Cited By
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
24 October 2024
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark"
37 / 37 papers shown
Title
CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation
Naihao Deng
Kapotaksha Das
Rada Mihalcea
Vitaliy Popov
M. Abouelenien
25
0
0
15 Jun 2025
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following
Yinghao Ma
Siyou Li
Juntao Yu
Emmanouil Benetos
Akira Maezawa
AuLLM
VLM
29
0
0
14 Jun 2025
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
Tzu-wen Hsu
Ke-Han Lu
Cheng-Han Chiang
Hung-yi Lee
AuLLM
28
0
0
08 Jun 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
Chih-Kai Yang
Neo Ho
Yi-Jyun Lee
Hung-yi Lee
AuLLM
99
0
0
05 Jun 2025
MokA: Multimodal Low-Rank Adaptation for MLLMs
Yake Wei
Yu Miao
Dongzhan Zhou
Di Hu
104
0
0
05 Jun 2025
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Chun-Yi Kuan
Hung-yi Lee
AuLLM
79
0
0
26 May 2025
Towards Reliable Large Audio Language Model
Ziyang Ma
Xiquan Li
Yakun Song
Wenxi Chen
Chenpeng Du
...
Y. Chen
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen
AuLLM
74
0
0
25 May 2025
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
Ke-Han Lu
Chun-Yi Kuan
Hung-yi Lee
AuLLM
ELM
65
3
0
25 May 2025
SpeakStream: Streaming Text-to-Speech with Interleaved Data
Richard He Bai
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
AuLLM
AI4TS
44
0
0
25 May 2025
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models
Yiming Gao
Bin Wang
Chengwei Wei
Shuo Sun
AiTi Aw
MLLM
AuLLM
56
0
0
22 May 2025
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Chengwei Wei
Bin Wang
Jung-jae Kim
Nancy F. Chen
AuLLM
ReLM
LRM
66
0
0
21 May 2025
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Ziyang Ma
Yinghao Ma
Yanqiao Zhu
Chen Yang
Yi-Wen Chao
...
Wei Xue
Emmanouil Benetos
Kai Yu
Xiaofeng Wang
Xie Chen
AuLLM
LRM
114
1
0
19 May 2025
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
Debarpan Bhattacharya
Apoorva Kulkarni
Sriram Ganapathy
79
0
0
19 May 2025
Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation
Qiongqiong Wang
Hardik B. Sailor
Tianchi Liu
Ai Ti Aw
105
0
0
19 May 2025
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
Chih-Kai Yang
Neo Ho
Yen-Ting Piao
Hung-yi Lee
AuLLM
LRM
175
4
0
19 May 2025
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Andrew Rouditchenko
Saurabhchand Bhati
Edson Araujo
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
VLM
103
0
0
14 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yongqian Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLM
ALM
121
0
0
14 May 2025
Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge
Chao-Han Huck Yang
Sreyan Ghosh
Qing Wang
Jaeyeon Kim
Hengyi Hong
...
Tianyi Zhou
Gunhee Kim
Jun Du
Rafael Valle
Bryan Catanzaro
71
0
0
12 May 2025
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLM
VLM
185
13
0
25 Apr 2025
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning
Cheng Wen
Tingwei Guo
Shuaijiang Zhao
Wei Zou
Xiangang Li
OffRL
AuLLM
LRM
127
6
0
22 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
155
14
0
11 Apr 2025
Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks
Yongyi Zang
Sean O'Brien
Taylor Berg-Kirkpatrick
Julian McAuley
Cheng-i Wang
AuLLM
144
2
0
01 Apr 2025
The Deployment of End-to-End Audio Language Models Should Take into Account the Principle of Least Privilege
Luxi He
Xiangyu Qi
Michel Liao
Inyoung Cheong
Prateek Mittal
Danqi Chen
Peter Henderson
AuLLM
106
0
0
21 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yize Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
213
31
0
16 Mar 2025
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Gang Li
Jizhong Liu
Heinrich Dinkel
Yadong Niu
Junbo Zhang
Jian Luan
OffRL
LRM
ReLM
165
12
0
14 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLM
ReLM
LRM
113
4
0
11 Mar 2025
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information
Feng Jiang
Zhiyu Lin
Fan Bu
Yuhao Du
Benyou Wang
Haoyang Li
AuLLM
ELM
130
2
0
07 Mar 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Sreyan Ghosh
Zhifeng Kong
Sonal Kumar
S. Sakshi
Jaehyeon Kim
Ming-Yu Liu
Rafael Valle
Dinesh Manocha
Bryan Catanzaro
MLLM
AuLLM
LRM
126
21
0
06 Mar 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Ziqiang Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLM
OffRL
LRM
155
17
0
04 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Yanzhe Zhang
Xiren Zhou
MoE
SyDa
122
70
0
03 Mar 2025
Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models
Heeseung Kim
Che Hyun Lee
Sangkwon Park
Jiheum Yeom
Nohil Park
Sangwon Yu
Sungroh Yoon
130
1
0
27 Feb 2025
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Xilin Jiang
Sukru Samet Dindar
Vishal B. Choudhari
Stephan Bickel
A. Mehta
Guy M McKhann
A. Flinker
D. Friedman
N. Mesgarani
112
2
0
24 Feb 2025
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Yike Guo
Wei Xue
MLLM
AuLLM
CLIP
VLM
93
1
0
23 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
164
4
0
28 Jan 2025
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Zejun Ma
Zhuo Chen
Yansen Wang
Xiaofeng Wang
Xie Chen
AuLLM
LRM
123
15
0
13 Jan 2025
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
203
25
0
01 Oct 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Siyang Song
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
169
35
0
23 Jun 2024
1