21
0

ALLM4ADD\mathcal{A}LLM4ADD: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection

Abstract

Audio deepfake detection (ADD) has grown increasingly important due to the rise of high-fidelity audio generative models and their potential for misuse. Given that audio large language models (ALLMs) have made significant progress in various audio processing tasks, a heuristic question arises: Can ALLMs be leveraged to solve ADD?. In this paper, we first conduct a comprehensive zero-shot evaluation of ALLMs on ADD, revealing their ineffectiveness in detecting fake audio. To enhance their performance, we propose ALLM4ADD\mathcal{A}LLM4ADD, an ALLM-driven framework for ADD. Specifically, we reformulate ADD task as an audio question answering problem, prompting the model with the question: "Is this audio fake or real?". We then perform supervised fine-tuning to enable the ALLM to assess the authenticity of query audio. Extensive experiments are conducted to demonstrate that our ALLM-based method can achieve superior performance in fake audio detection, particularly in data-scarce scenarios. As a pioneering study, we anticipate that this work will inspire the research community to leverage ALLMs to develop more effective ADD systems.

View on arXiv
@article{gu2025_2505.11079,
  title={ $\mathcal{A}LLM4ADD$: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection },
  author={ Hao Gu and Jiangyan Yi and Chenglong Wang and Jianhua Tao and Zheng Lian and Jiayi He and Yong Ren and Yujie Chen and Zhengqi Wen },
  journal={arXiv preprint arXiv:2505.11079},
  year={ 2025 }
}
Comments on this paper