Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

21 March 2025

Abstract

Institutions with limited data and computing resources often outsource model training to third-party providers in a semi-honest setting, assuming adherence to prescribed training protocols with pre-defined learning paradigm (e.g., supervised or semi-supervised learning). However, this practice can introduce severe security risks, as adversaries may poison the training data to embed backdoors into the resulting model. Existing detection approaches predominantly rely on statistical analyses, which often fail to maintain universally accurate detection accuracy across different learning paradigms. To address this challenge, we propose a unified backdoor detection framework in the semi-honest setting that exploits cross-examination of model inconsistencies between two independent service providers. Specifically, we integrate central kernel alignment to enable robust feature similarity measurements across different model architectures and learning paradigms, thereby facilitating precise recovery and identification of backdoor triggers. We further introduce backdoor fine-tuning sensitivity analysis to distinguish backdoor triggers from adversarial perturbations, substantially reducing false positives. Extensive experiments demonstrate that our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines across supervised, semi-supervised, and autoregressive learning tasks, respectively. Notably, it is the first to effectively detect backdoors in multimodal large language models, further highlighting its broad applicability and advancing secure deep learning.

View on arXiv

@article{wang2025_2503.16872,
  title={ Lie Detector: Unified Backdoor Detection via Cross-Examination Framework },
  author={ Xuan Wang and Siyuan Liang and Dongping Liao and Han Fang and Aishan Liu and Xiaochun Cao and Yu-liang Lu and Ee-Chien Chang and Xitong Gao },
  journal={arXiv preprint arXiv:2503.16872},
  year={ 2025 }
}

Comments on this paper