ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19294
66
0

Towards Reliable Large Audio Language Model

25 May 2025
Ziyang Ma
Xiquan Li
Yakun Song
Wenxi Chen
Chenpeng Du
Jian Wu
Y. Chen
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen
    AuLLM
ArXiv (abs)PDFHTML
Main:8 Pages
6 Figures
Bibliography:2 Pages
12 Tables
Appendix:5 Pages
Abstract

Recent advancements in large audio language models (LALMs) have demonstrated impressive results and promising prospects in universal understanding and reasoning across speech, music, and general sound. However, these models still lack the ability to recognize their knowledge boundaries and refuse to answer questions they don't know proactively. While there have been successful attempts to enhance the reliability of LLMs, reliable LALMs remain largely unexplored. In this paper, we systematically investigate various approaches towards reliable LALMs, including training-free methods such as multi-modal chain-of-thought (MCoT), and training-based methods such as supervised fine-tuning (SFT). Besides, we identify the limitations of previous evaluation metrics and propose a new metric, the Reliability Gain Index (RGI), to assess the effectiveness of different reliable methods. Our findings suggest that both training-free and training-based methods enhance the reliability of LALMs to different extents. Moreover, we find that awareness of reliability is a "meta ability", which can be transferred across different audio modalities, although significant structural and content differences exist among sound, music, and speech.

View on arXiv
@article{ma2025_2505.19294,
  title={ Towards Reliable Large Audio Language Model },
  author={ Ziyang Ma and Xiquan Li and Yakun Song and Wenxi Chen and Chenpeng Du and Jian Wu and Yuanzhe Chen and Zhuo Chen and Yuping Wang and Yuxuan Wang and Xie Chen },
  journal={arXiv preprint arXiv:2505.19294},
  year={ 2025 }
}
Comments on this paper