360

Rethinking the Trigger of Backdoor Attack

Abstract

Backdoor attacks raise serious security concerns about obtaining or training models through third-party platforms. Backdoor attackers add a specific trigger (i.e., a local patch) onto some training images to encourage that the testing images with the same trigger are incorrectly predicted, while the natural testing images are correctly predicted by the trained model. Many backdoor attack and defense methods are proposed, whereas the property or the behavior of the attacked model has not been well studied. In this paper, we start with the study of the property of the backdoor trigger. Most existing works adopted the setting that the triggers across the training and testing images follow the same appearance and are located in the same area. However, we demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training. If the appearance or location of the trigger is slightly changed, then the attack performance may degrade sharply. Inspired by this property, we further verify that existing attacks are \emph{transformation vulnerable}. In other words, introducing a transformation-based pre-processing (e.g.e.g., flipping and scaling) on the testing image before prediction is effective to defend many state-of-the-art backdoor attacks. The defense performance of this simple strategy is on par with state-of-the-art defenses, while with nearly no extra computational cost. Furthermore, we also propose a transformation-based attack enhancement to improve the robustness of existing attacks towards transformation-based defense. Extensive experiments verify that the enhanced attack is robust to transformations and is also effective under the setting of physical attack.

View on arXiv
Comments on this paper