Replay Attacks Against Audio Deepfake Detection

We show how replay attacks undermine audio deepfake detection: By playing and re-recording deepfake audio through various speakers and microphones, we make spoofed samples appear authentic to the detection model. To study this phenomenon in more detail, we introduce ReplayDF, a dataset of recordings derived from M-AILABS and MLAAD, featuring 109 speaker-microphone combinations across six languages and four TTS models. It includes diverse acoustic conditions, some highly challenging for detection. Our analysis of six open-source detection models across five datasets reveals significant vulnerability, with the top-performing W2V2-AASIST model's Equal Error Rate (EER) surging from 4.7% to 18.2%. Even with adaptive Room Impulse Response (RIR) retraining, performance remains compromised with an 11.0% EER. We release ReplayDF for non-commercial research use.
View on arXiv@article{müller2025_2505.14862, title={ Replay Attacks Against Audio Deepfake Detection }, author={ Nicolas Müller and Piotr Kawa and Wei-Herng Choong and Adriana Stan and Aditya Tirumala Bukkapatnam and Karla Pizzi and Alexander Wagner and Philip Sperl }, journal={arXiv preprint arXiv:2505.14862}, year={ 2025 } }