Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.11834
Cited By
v1
v2 (latest)
Pengi: An Audio Language Model for Audio Tasks
19 May 2023
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Pengi: An Audio Language Model for Audio Tasks"
46 / 46 papers shown
Title
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following
Yinghao Ma
Siyou Li
Juntao Yu
Emmanouil Benetos
Akira Maezawa
AuLLM
VLM
33
0
0
14 Jun 2025
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
Yusuke Fujita
Tomoya Mizumoto
Atsushi Kojima
Lianbo Liu
Yui Sudo
AuLLM
121
0
0
12 Jun 2025
CoLMbo: Speaker Language Model for Descriptive Profiling
Massa Baali
Shuo Han
Syed Abdul Hannan
Purusottam Samal
Karanveer Singh
Soham Deshmukh
Rita Singh
Bhiksha Raj
AuLLM
93
0
0
11 Jun 2025
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Ailin Huang
B. Li
Bruce Wang
Boyong Wu
Chao Yan
...
X. Zhang
Yibo Zhu
Daxin Jiang
Shuchang Zhou
Chen-Hao Hu
AuLLM
75
0
0
10 Jun 2025
Teaching Physical Awareness to LLMs through Sounds
Weiguo Wang
Andy Nie
Wenrui Zhou
Yi Kai
Chengchen Hu
40
0
0
10 Jun 2025
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
Wenyu Zhang
Yingxu He
Geyu Lin
Zhuohan Liu
Shuo Sun
...
Jeremy H.M Wong
Qiongqiong Wang
Hardik B. Sailor
Nancy F. Chen
Ai Ti Aw
AuLLM
42
0
0
07 Jun 2025
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
Bikash Dutta
Rishabh Ranjan
Shyam Sathvik
Mayank Vatsa
Richa Singh
22
0
0
07 Jun 2025
Learning Sparsity for Effective and Efficient Music Performance Question Answering
Xingjian Diao
Tianzhen Yang
Chunhui Zhang
Weiyi Wu
Ming Cheng
Jiang Gui
76
1
0
02 Jun 2025
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
Youze Wang
Wenbo Hu
Yinpeng Dong
Jing Liu
Hanwang Zhang
Richang Hong
71
2
0
02 Jun 2025
MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge
Xin Jing
Jiadong Wang
Iosif Tsangko
Andreas Triantafyllopoulos
Björn Schuller
31
0
0
30 May 2025
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Chun-Yi Kuan
Hung-yi Lee
AuLLM
81
0
0
26 May 2025
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
Pooneh Mousavi
Shubham Gupta
Cem Subakan
Mirco Ravanelli
53
0
0
24 May 2025
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
Junlin Li
Guodong DU
Jing Li
Sim Kuan Goh
Wenya Wang
...
Fangming Liu
Jing Li
Saleh Alharbi
Daojing He
Min Zhang
MoMe
CLL
144
1
0
21 May 2025
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Andrew Rouditchenko
Saurabhchand Bhati
Edson Araujo
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
VLM
111
0
0
14 May 2025
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
Tsai-Ning Wang
Lin-Lin Chen
Neil Zeghidour
Aaqib Saeed
AuLLM
LM&MA
421
0
0
02 May 2025
Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks
Yongyi Zang
Sean O'Brien
Taylor Berg-Kirkpatrick
Julian McAuley
Cheng-i Wang
AuLLM
151
2
0
01 Apr 2025
Position: Interactive Generative Video as Next-Generation Game Engine
Jiwen Yu
Yiran Qin
Haoxuan Che
Quande Liu
Xintao Wang
Pengfei Wan
Di Zhang
Xihui Liu
VGen
119
4
0
21 Mar 2025
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
Ali Vosoughi
Dimitra Emmanouilidou
H. Gamper
135
1
0
12 Mar 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLM
SyDa
VLM
174
1
0
18 Feb 2025
From No to Know: Taxonomy, Challenges, and Opportunities for Negation Understanding in Multimodal Foundation Models
Mayank Vatsa
Aparna Bharati
S. Mittal
Richa Singh
112
0
0
10 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
175
4
0
28 Jan 2025
AudioBERT: Audio Knowledge Augmented Language Model
Hyunjong Ok
Suho Yoo
Jaeho Lee
AuLLM
RALM
VLM
91
0
0
17 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
254
134
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
193
3
0
10 Jan 2025
"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer?
Benjamin Z. Reichman
Kartik Talamadupula
102
0
0
07 Jan 2025
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
Chun-Yi Kuan
Hung-yi Lee
AuLLM
LRM
179
7
0
03 Jan 2025
Instruction-Guided Scene Text Recognition
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
216
3
0
03 Jan 2025
Empowering LLMs to Understand and Generate Complex Vector Graphics
Ximing Xing
Juncheng Hu
Guotao Liang
Jing Zhang
Dong Xu
Qian Yu
195
12
0
15 Dec 2024
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLM
VLM
174
8
0
23 Oct 2024
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
AuLLM
VLM
143
5
0
20 Oct 2024
Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task
Jozef Coldenhoff
Milos Cernak
105
0
0
21 Sep 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
114
12
0
18 Sep 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
95
2
0
14 Sep 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
Wentao Zhang
Shuo Sun
Bin Wang
Xunlong Zou
Zhuohan Liu
Yingxu He
Geyu Lin
Nancy F. Chen
Ai Ti Aw
AuLLM
123
1
0
10 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
119
54
0
10 Sep 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Siyang Song
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
169
35
0
23 Jun 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
118
44
0
14 May 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
107
68
0
31 Mar 2024
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
99
22
0
08 Feb 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
127
19
0
02 Feb 2024
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
106
133
0
11 Dec 2023
Joint Music and Language Attention Models for Zero-shot Music Tagging
Xingjian Du
Zhesong Yu
Jiaju Lin
Bilei Zhu
Qiuqiang Kong
BDL
VLM
65
9
0
16 Oct 2023
LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Muhammad Ahmed Shah
Roshan S. Sharma
Hira Dhamyal
R. Olivier
Ankit Shah
...
Massa Baali
Soham Deshmukh
Michael Kuhlmann
Bhiksha Raj
Rita Singh
AAML
67
21
0
02 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
119
58
0
30 Sep 2023
Instruction-Following Speech Recognition
Cheng-I Jeff Lai
Zhiyun Lu
Liangliang Cao
Ruoming Pang
AuLLM
80
6
0
18 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
202
39
0
24 Aug 2023
1