Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks

11 June 2020

Kathrin Grosse

Battista Biggio

Michael Backes

Abstract

Backdoor attacks aim to mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time. These attacks require poisoning the training data or compromising the learning algorithm, e.g., by injecting poisoning samples containing the trigger into the training set, along with the desired class label. Despite the increasing number of studies on backdoor attacks and defenses, the underlying factors affecting the success of backdoor attacks, along with their impact on the learning algorithm, are not yet well understood. In this work, we aim to shed light on this issue. In particular, we unveil that backdoor attacks work by inducing a smoother decision function around the triggered samples -- a phenomenon which we refer to as \textit{backdoor smoothing}. We quantify backdoor smoothing by defining a measure that evaluates the uncertainty associated to the predictions of a classifier around the input samples. Our experiments show that smoothness increases when the trigger is added to the input samples, and that the phenomenon is more pronounced for more successful attacks. However, our experiments also show that patterns fulfilling backdoor smoothing can be crafted even without poisoning the training data. Although our measure may not be directly exploited as a defense mechanism, it unveils an important phenomenon which may pave the way towards understanding the limitations of current defenses that rely on a smooth decision output for backdoors.

View on arXiv

Comments on this paper